Take a quick demo tour of Connect to see the added benefits you’ll have when you partner with Adswerve. Get access to hundreds of training videos, the latest industry news, your account and billing information and proprietary tools and applications to help you get more done—plus, so much more!
One of BigQuery’s key differentiators is its nested data. Instead of a relational table architecture, often BigQuery tables are denormalized and multiple entity types end up in the same table. In the GA export, you will usually first notice this with hits and sessions stored in the same table. Typically, sessions would be expected to have a one-to-many relation to hits.
There are up to three levels of nesting within the GA export pertaining to 11 attributes.
Sessions (on the root level each row is a session)
Custom Dimensions (these are user and session-scoped custom dimensions only)
Hits (every pageview, event or other hit type belonging to the "parent" row i.e. a session)
Product (each hit can contain multiple products, either as impressions, part of a transaction, etc.)
Custom Dimensions (a product can have multiple custom dimensions)
Custom Metrics (a product can have multiple custom metrics; this is the deepest level of nesting)
Promotion
Experiment
Custom Variables
Custom Dimensions (hit scoped)
Custom Metrics (hit scoped)
Publisher Infos
There is a common confusion between records and nesting. Records are similar attributes grouped together and just like other data types, they can be nested. For example, traffic source data is all under the same record (trafficSource). However, the record itself is not nested since it only appears once per session. On the other hand, custom dimensions are a nested (appears multiple times per session, hit or product) attribute of a type record (includes attributes index and value). Even though all repeated (or nested) fields in the GA export are records, repeated fields could be of any type.
How to Unnest or Access Nested Values
Flatten the Table
As we mentioned earlier, nesting is similar to a one-to-many connection to another table. For this example, we could almost think about hits and sessions as two separate tables that are joined together with the unnest command.
[sourcecode language="sql" title="Flattening a Table"]SELECT * FROM `project-id.dataset-id.ga_sessions_20200214`,
unnest(hits) as hits[/sourcecode]
The query above would transform the following simplified Google Analytics export table, which has one row (with hits nested within that row).
Client Id
Visit Number
Medium
Hits.Number
Hits.Page
1234.1234
3
Organic
1
/
2
/blog
3
/blog/Luka
4
/blog/Luka/bq-howto-1
5
/blog/Luka/bq-howto-2
Into
Client Id
Visit Number
Medium
Hits.Number
Hits.Page
1234.1234
3
Organic
1
/
1234.1234
3
Organic
2
/blog
1234.1234
3
Organic
3
/blog/Luka
1234.1234
3
Organic
4
/blog/Luka/bq-howto-1
1234.1234
3
Organic
5
/blog/Luka/bq-howto-2
The "COUNT(*)" function for table 1 would return 1 (treating each row as a session), while the same command when using "unnest(hits)" would return 5. Each row is now a hit, and all the parent (session) attributes (client id, visit number, and medium) were inherited to each hit.
Access Individual Rows with a Subquery
Individual nested rows can be accessed as one of the attributes. This is particularly useful when trying to access specific custom dimensions or certain hits like landing or exit pages. Unlike the previous method, this does not change the scope of the table (number of rows stays the same).
[sourcecode language="sql" title="Accessing Custom Dimension 5"]SELECT
(SELECT value FROM unnest(customDimensions) WHERE index = 5)
FROM `project-id.dataset-id.ga_sessions_*`[/sourcecode]
[sourcecode language="sql" title="Retrieveing Session's Landing Page"]SELECT
(SELECT page.pagePath FROM unnest(hits) WHERE isEntrance)
FROM `project-id.dataset-id.ga_sessions_*`[/sourcecode]
[sourcecode language="sql" title="Retrieveing Session's Exit Page"]SELECT
(SELECT page.pagePath FROM unnest(hits) WHERE isExit)
FROM `project-id.dataset-id.ga_sessions_*`[/sourcecode]
Make sure that your condition (index=5, isExit, isEntrance) returns a single value within a subquery. If results return multiple values, they need to be aggregated.
Take advantage of indexing
Rows in repeated (nested) fields are ordered. Custom dimensions are ordered by their index ascending, hits by hit number, products by position, etc. This gives us the ability to access any row within a nested field using either offset (0-based) and ordinal (1-based) indexing.
Note that hits[offset(0)] is not the same as isEntrance = True, the difference is that isEntrance is true on the first pageview or screenview of the session. The first hit could also be of a hit type different than pageview or screenview, like, an event.
[sourcecode language="sql" title="First hit of a Session"]SELECT
hits[offset(0)].page.pagePath as firstHitOfSession,
hits[ordinal(1)].page.pagePath as firstHitOfSessionOrdinal
FROM `project-id.dataset-id.ga_sessions_*`[/sourcecode]
You may encounter issues when trying to access the second or third hit of the session. All sessions have at least one hit, but some may only have a single hit. So accessing a second hit of the session will generate an issue.
[sourcecode language="sql" title="Third hit of a Session"]SELECT
hits[offset(2)].page.pagePath as thirdHitOfSession,
hits[ordinal(3)].page.pagePath as thirdHitOfSessionOrdinal
FROM `project-id.dataset-id.ga_sessions_*`[/sourcecode]
To avoid this, use a prefix safe_ which will use a null value in place of the out of bounds results.
[sourcecode language="sql" title="Third hit of a Session (safe)"]SELECT
hits[safe_offset(2)].page.pagePath as thirdHitOfSession,
hits[safe_ordinal(3)].page.pagePath as thirdHitOfSessionOrdinal
FROM `project-id.dataset-id.ga_sessions_*`[/sourcecode]