Historically, the only way for Google Analytics users to access and export raw data from GA was through the enterprise version, GA360. However, with the introduction of Google Analytics 4 (GA4), this helpful feature is now available to everyone using GA4 at no additional cost. With the update, we're also getting a redesigned schema and a new approach to query the data. Let's take a deeper look at this exciting update and see how it compares to the existing export.
The BigQuery export can be turned on from within the Google Analytics UI. Go to the admin section and select "BigQuery Linking".
Once selected, the existing connection (if any) will be displayed. Set up a new export by clicking on the blue "Link" button.
On the linking screen, select a Google Cloud Project that you manage. If your organization does not have a GCP project yet, you may have to create a new one.
When configuring a link, you will be able to select the Google Analytics data streams that you would like to see flow into BigQuery as well as two options for frequency. The daily frequency option will provide you with a full daily export of data from the previous day. In our project, we usually saw table created between 5-6 AM. The streaming option will allow you to query Google Analytics hits from your website in BigQuery within seconds. Streaming in GA4 is a big step-up from the previous version, not just in terms of speed but also in terms of data structure, since hits are not duplicated and an additional deduplication view is not required.
Once a link is successfully established, you will see a green "LINK CREATED" badge on the BigQuery linking page.
Only one link per Google Analytics property is allowed.
*Streaming will inccur an additional cost of $0.05 per GB of data streamed. The average row (hit) size will depend on the number and size of the attributes passed. To get a sense of scale however 1kB per hit size should be a good estimate to start with.
Data will be exported to the project that was set at link creation in Google Analytics. The main change pertaining to the data location is the dataset name. Instead of the numeric (view id) dataset, it is formatted as analytics_PROPERTY ID. The daily table is formatted as events_YYYYMMDD and the current day's table as events_intraday_YYYYMMDD.
Based on my sample of about 100 daily tables, the daily export becomes available around 6 AM (timezone of the property).
When comparing the GA4 export schema to the existing export, you will notice a pretty significant change. Individual rows are now hits or events (visits or sessions in the universal analytics) and some metrics and dimensions that were previously provided as part of the schema, now have to be extracted from other event attributes.
2. Number of New Users
Since "visit number" and "new user" are not readily available attributes anymore. A query looking for a number of new users on a given day now requires unnesting of event values. This will be a frequent practice in the GA4 export. Make sure to bookmark or remember the where part of the query below
3. Most Viewed Pages by Title
With every hit being an event, we do not have a hit type parameter anymore. To limit our query to only page views we have to filter it by event name ("page_view" by default).
If you have any additional questions, we can help! Contact us to learn more.