Using BigQuery's new (UN)PIVOT and QUALIFY Features with the GA4 Export

Written by Luka Cempre | May 14, 2021 6:00:00 AM

On May 10th Google announced new features that make working with BigQuery even more efficient. This article will look into utilizing PIVOT and QUALIFY with the Google Analytics 4 export data in BigQuery. With Google Analytics 4, the raw level export is now available to free and enterprise users; learn more about it here.

PIVOT - a Perfect Way to Count Events by User

In the Google Analytics 4 export, each row is an event. Counting the number of rows will therefore represent the number of events. In the example below, we are grouping those events by 'user_pseudo_id' (or the Google Analytics client id) and 'event_name' (descriptions of actions in your digital property).

[sourcecode lang="SQL"] SELECT user_pseudo_id, event_name, count(*) events FROM `projectId.analytics_123456789.events_20210512` GROUP BY 1, 2 [/sourcecode] Yes, GA4 and BigQuery allow the use of emojis.

A very common overview of users and their events would have rows split by users and column metrics for each event. This is where the PIVOT operator comes in.

To use the PIVOT operator we will start with a table expression (the query above), followed by the PIVOT keyword, and then provide the:

Aggregate function (such as summing the events)
Input column (in our case the event_name, this column will be removed from the new table and split across multiple columns by its value)
Pivot column (these are the values from the input column that we would like to see as new independent columns)

[sourcecode lang="SQL"] FROM from_item[, ...] pivot_operator pivot_operator: PIVOT( aggregate_function_call [as_alias][, ...] FOR input_column IN ( pivot_column [as_alias][, ...] ) ) [AS alias] as_alias: [AS] alias [/sourcecode]

Above syntax and additional documentation on the PIVOT operator is available in the BigQuery documentation.

[sourcecode lang="SQL" highlight="6-8"] WITH events as (SELECT user_pseudo_id, event_name, count(*) events FROM `projectId.analytics_123456789.events_20210512` GROUP BY 1, 2) SELECT * FROM (SELECT * FROM events) PIVOT (SUM(events) for event_name IN("session_start", "page_view", "blog_view", "scroll", "first_visit")) [/sourcecode]

We could also completely avoid aggregating in the first step (and at the same time avoid null values) by simply using count(*) on the unaggregated table.

[sourcecode lang="SQL" highlight="8"] WITH events as (SELECT user_pseudo_id, event_name FROM `projectId.analytics_123456789.events_20210512` ) SELECT * FROM (SELECT * FROM events) PIVOT ( count(*) for event_name IN("session_start", "page_view", "blog_view", "scroll", "first_visit")) [/sourcecode]

Newly structured data now provides us with a perfect schema for analysis or even machine learning. With BQML, the output from the query above can be plugged into k-means clustering in a matter of minutes.

QUALIFY - Make Those Analytic Function Queries Pretty

Typically a query using analytic functions would require us to use subqueries to filter by its outputs. In the example below, we look at the last event of each Google Analytics session without using the QUALIFY function.

The query outputs events associated to users and their sessions along with a "lastToFirst" attribute that ranks the events within session from last to first.

[sourcecode lang="SQL"] WITH sessionEvents AS ( SELECT user_pseudo_id, (SELECT value.int_value FROM unnest(event_params) WHERE key="ga_session_number") as sessionNumber, event_name, event_timestamp FROM `analyticspros.com:spotted-cinnamon-834.analytics_209734898.events_intraday_20210513`) SELECT *, RANK() OVER (PARTITION BY user_pseudo_id, sessionNumber ORDER BY event_timestamp DESC) as lastToFirst FROM sessionEvents [/sourcecode]

Up until now, filtering to only the last event of the session (lastToFirst = 1), would require us to use the above as a subquery input of a filtering "wrapper".

[sourcecode lang="SQL"] SELECT * FROM( SELECT *, RANK() OVER (PARTITION BY user_pseudo_id, sessionNumber ORDER BY event_timestamp DESC) as lastToFirst FROM sessionEvents ) WHERE lastToFirst = 1 [/sourcecode]

By using the QUALIFY clause, we can take care of filtering on the same hierarchical level of queries, get rid of a few rows of SQL and make our queries more readable.

[sourcecode lang="SQL"] SELECT *, RANK() OVER (PARTITION BY user_pseudo_id, sessionNumber ORDER BY event_timestamp DESC) as lastToFirst FROM sessionEvents WHERE True QUALIFY lastToFirst = 1 [/sourcecode]

*Per the documentation, Qualify can only be used with WHERE, GROUP BY or HAVING clause. That's why I'm "forcing" the WHERE 1=1 before the QUALIFY.

Resources

Please, feel free to reach out with any requests, questions or feedback to @lukaslo and @adswerve on Twitter.

View full post