Welcome to the second article in our Google Analytics 4 (GA4) BigQuery tips series. In the previous article, we discussed higher-level metrics such as users and sessions and linked to details about the GA4 schema and its relation to the old export.
The GA4 export to BigQuery includes three fields with repeated values: event parameters, user properties and items. In this post, we'll examine how to best access those fields and how to utilize them.
Unlike in the old Universal Analytics (UA), the new GA export simplifies the schema with only a single level of repeated fields.
The event parameters are sent to Google Analytics and subsequently to BigQuery with every event. The event params field includes native/default event parameters as well as custom ones. The structure of the event_params consists of the key, which is a string, and a value record, which will only have one data type populated (string, integer, float, or double). Pay attention to the type of data that you're querying for; this is different than in the UA export, where all the values were stored as strings.
The two parameters present on every hit are the session identifier and session number (ga_session_number, ga_session_id). When either of those is connected to the user pseudo id, we can associate individual events back to sessions.
1
2
3
4
5
6
|
SELECT user_pseudo_id, ( SELECT value.int_value FROM unnest(event_params) WHERE key = "ga_session_number" ) as sessionNumber, countif(event_name = "page_view" ) as pageviews FROM `analyticspros.com:spotted-cinnamon-834.analytics_206551716.events_20230121` GROUP BY 1,2 ORDER BY 3 DESC |
Note accessing the ga_session_number is in row three, where we:
1. Unnest the event_params field to access all the key/value pairs for the event.
2. Filter the event parameter by key (in this case, we're accessing one per event)
3. Select the int_value of the value field (session number identifies the number of sessions the user has started - including the current one)
The user properties field is structured exactly as the event parameters with a key and multiple value options stored as a struct. User properties are attributes used to describe users as opposed to individual events. One important thing that we've noticed so far is that the user property does not persist once set.
To use it in your queries you should therefore:
1
2
3
4
5
6
7
8
|
SELECT MAX (up_region) OVER(PARTITION BY user_pseudo_id) as user_region, event_name, user_pseudo_id, event_timestamp FROM ( SELECT *, ( SELECT value.string_value FROM unnest(user_properties) WHERE key = "region" ) as up_region FROM `myproject.mydataset.events_*` ) |
In the example above, the user set their region property at one point. Since this is a user-scoped property, we know that all the events by that user should "belong" to the same region. Partitioning by user_pseudo_id and selecting the non-null value for the region (MAX) allows us to do just that. There are, of course, many ways to solve this problem and upgrades to this solution as well. A good exercise is to think about how this implementation would work if the user changed their region in the future.
Items are the last repeated field in the new schema and are associated with e-commerce events. The item struct is built out of 26 attributes. Some of the more important ones are item_id, item_name, item_brand, price and quantity.
Common events that will include items are "add_to_cart", "add_to_wishlist", "purchase", "begin_checkout" and similar. Because many of these events can include multiple items, the field type repeated.
In the example below, we query for the most popular (by quantity purchased) products in the Google Demo store.
1
2
3
4
5
|
SELECT item.item_name, sum (item.quantity) as quantity, sum (item.price) as price FROM `bigquery- public -data.ga4_obfuscated_sample_ecommerce.events_202012*`, unnest(items) as item WHERE event_name = "purchase" GROUP BY 1 ORDER BY 2 DESC |
Make sure to put "purchase" as the event name. If you do not use the event_name the quantity will include values from events such as add to cart, item views, etc.
Nested fields in BigQuery schema tend to be the biggest learning bump in writing SQL queries in BigQuery that many of us have to go through, especially when transitioning from more traditional databases. GA4 export uses them differently than UA did. The advantage we gain is easier access to all the fields (no multi-level nesting). On the other hand, it also brings challenges, as we saw with user properties, where the value does not persist throughout all the user events.
If you have questions about GA4 event parameters, user properties, or items, please don’t hesitate to reach out.