Insights

The Google Analytics 4 BigQuery Export: A Quick Overview

Written by Luka Cempre | Feb 16, 2021 7:00:00 AM

Historically, the only way for Google Analytics users to access and export raw data from GA was through the enterprise version, GA360. However, with the introduction of Google Analytics 4 (GA4), this helpful feature is now available to everyone using GA4 at no additional cost. With the update, we're also getting a redesigned schema and a new approach to query the data. Let's take a deeper look at this exciting update and see how it compares to the existing export.

How to Turn On the GA4 BigQuery Export

The BigQuery export can be turned on from within the Google Analytics UI. Go to the admin section and select "BigQuery Linking".

Turning on BQ linking for a GA4 property

Once selected, the existing connection (if any) will be displayed. Set up a new export by clicking on the blue "Link" button.

On the linking screen, select a Google Cloud Project that you manage. If your organization does not have a GCP project yet, you may have to create a new one.

When configuring a link, you will be able to select the Google Analytics data streams that you would like to see flow into BigQuery as well as two options for frequency. The daily frequency option will provide you with a full daily export of data from the previous day. In our project, we usually saw table created between 5-6 AM. The streaming option will allow you to query Google Analytics hits from your website in BigQuery within seconds. Streaming in GA4 is a big step-up from the previous version, not just in terms of speed but also in terms of data structure, since hits are not duplicated and an additional deduplication view is not required.

Once a link is successfully established, you will see a green "LINK CREATED" badge on the BigQuery linking page.

Only one link per Google Analytics property is allowed.

*Streaming will inccur an additional cost of $0.05 per GB of data streamed. The average row (hit) size will depend on the number and size of the attributes passed. To get a sense of scale however 1kB per hit size should be a good estimate to start with.

Are There Changes to the Table Location?

Data will be exported to the project that was set at link creation in Google Analytics. The main change pertaining to the data location is the dataset name. Instead of the numeric (view id) dataset, it is formatted as analytics_PROPERTY ID. The daily table is formatted as events_YYYYMMDD and the current day's table as events_intraday_YYYYMMDD.

Based on my sample of about 100 daily tables, the daily export becomes available around 6 AM (timezone of the property).

How Do the Schema Compare?

When comparing the GA4 export schema to the existing export, you will notice a pretty significant change. Individual rows are now hits or events (visits or sessions in the universal analytics) and some metrics and dimensions that were previously provided as part of the schema, now have to be extracted from other event attributes.

First Queries

  1. Number of Users
    Client ID value can now be found in user_pseudo_id attribute. To calculate the number of unique users on a given day, we need to look for unique number of those ids.
[code language="SQL"]SELECT count(distinct user_pseudo_id) FROM `projectId.analytics_property.events_20210125`[/code]

2. Number of New Users
Since "visit number" and "new user" are not readily available attributes anymore. A query looking for a number of new users on a given day now requires unnesting of event values. This will be a frequent practice in the GA4 export. Make sure to bookmark or remember the where part of the query below

[code language="SQL"]SELECT count(distinct user_pseudo_id ) FROM `projectId.analytics_propertyId.events_20210126` WHERE (SELECT value.int_value FROM unnest(event_params) WHERE key = "ga_session_number") = 1[/code]

3. Most Viewed Pages by Title
With every hit being an event, we do not have a hit type parameter anymore. To limit our query to only page views we have to filter it by event name ("page_view" by default).

[code language="SQL"]SELECT value.string_value, count(*) FROM `projectId.analytics_propertyId.events_20210126`, unnest(event_params) as ep WHERE ep.key = "page_title" AND event_name = "page_view" GROUP BY 1 ORDER BY 2 DESC[/code]

Need More Information?

If you have any additional questions, we can help! Contact us to learn more.