The Google Analytics 4 BigQuery export is a powerhouse for unsampled, event-level data, making it a favorite tool for advanced analytics and data science. However, its power comes with complexity—especially when dealing with one of its most critical dimensions: traffic source data. For many users, understanding how this data is structured, where it resides and how to extract meaningful insights can feel like navigating a maze.
In this blog, we’ll break down the four key ways traffic source data appears in GA4 BigQuery exports. By the end, you’ll have a clearer understanding of how to leverage this information to uncover user behavior, campaign performance and more.
The Four Flavors of Traffic Source data
The Google Analytics traffic source data in the BigQuery export of events shows up in four different locations:
- As a traffic_source record with name, medium and source field
- As a session_traffic_source_last_click record with manual_campaign, google_ads_campaign, cross_channel_campaign, sa360_campaign, cm360_campaign, dv360_campaign records all of which further include basic dimensions (such as source, medium, campaign, etc.)
- As a collected_traffic_source record with campaign id, campaign name, source, medium, term, content, source_platform, creative_format, marketing_tactic, gclid, dclid and srsltid fields
- As a string event parameter (event_params) values for the following keys: source, medium, campaign, term, gclid
Logical Differences
Now that we know where to access all four traffic source attributes, let's look at how each of these offers a distinct look into dealing with traffic source dimensions.
1. traffic_source
Starting with the traffic_source field, a user-scoped traffic source information about users' first (acquisition) visit. This field looks at the user's first interaction with your property, attaches the source, medium and campaign name to the user, and persists for as long as the user (most commonly the GA cookie - user_pseudo_id) stays alive.
Scope: user (only set once for each user_pseudo_id)
Available: Since the start of the GA4 export (2019)
Present: on every event
GA4 UI: First user [source, medium, campaign]
SQL Example:
SELECT
traffic_source.medium,
COUNT(DISTINCT user_pseudo_id) AS users
FROM
`analytics_123456789.events_20241205`
GROUP BY 1
ORDER BY events DESC
2. session_traffic_source_last_click
The session_traffic_source_last_click record mimics the session traffic source data from the UI. With two important rules. It takes the first event-based traffic information from the session and sets it up for the whole session and follows the last non-direct attribution model (direct visits will look for the most recent non-direct traffic information if it exists).
Examples:
- A user visiting directly with a previous organic visit will have session_traffic_source_lasts_click medium set as organic.
- A user who starts their session via a paid search and before the session timeout (during the same session) also produces an organic medium, will have their session medium set as paid search.
Scope: session
Available: Since mid-July 2024 for manual_campaign and google_ads_campaign data, since mid-October 2024 for cross_channel_campaign (which includes the default and primary channel data), sa360_campaign, cm360_campaign and dv360_campaign data
Present: on every event
GA4 UI: Session [source, medium, campaign]
SQL Example: Confirming that each session only has one distinct value for session_traffic_source_last_click.
SELECT
user_pseudo_id,
(SELECT value.int_value FROM unnest(event_params) WHERE key = "ga_session_id") as ga_session_id,
count(distinct session_traffic_source_last_click.cross_channel_campaign.medium) as distinct_med
FROM `adswerve-data.analytics_423652181.events_20241204`
GROUP BY ALL
ORDER BY 3 DESC
3. collected_traffic_source
The collected_traffic_source record is an event-level traffic source information. The values for this record are collected with the first event of the page, but since there can be multiple page views in a single session, we can see multiple values for the collected traffic source. Each time a user interacts with your domain within a session using a new outside referral or utm_parameters this value will update. In the screenshot below (a specifically picked outlier) you can see how a user during a single session interacted with our website from 4 different source/medium combinations. As mentioned, page_view events store the traffic source info, which, for ease of use, also gets passed to the session_start event.
The great thing about collected_traffic_source is that the values are raw. Session scoping has not been applied, neither has the last-non-direct attribution. This offers a low-level view into the actual traffic source information parsed from the browser and allows you to uncover sessions with multiple traffic sources, separate true non-direct sessions from "traffic source inherited" direct sessions and more. On the other hand, the attribute above (session_traffic_source_last_click) is the standard approach to dealing with session-scoped traffic source data.
Scope: event
Available: Since May 2023, however, it only became available on session_start event in November 2023 (be careful when querying)
Present: on every event
GA4 UI: source, medium, campaign name
SQL Example: Code that generated the screenshot above
SELECT
TIMESTAMP_MICROS(event_timestamp) as event_timestamp,
event_name,
collected_traffic_source.manual_medium,
collected_traffic_source.manual_source,
collected_traffic_source.manual_campaign_name,
FROM `adswerve-data.analytics_123456.events_20241120`
WHERE
user_pseudo_id = "1710500093.1730318649"
AND (SELECT value.int_value FROM unnest(event_params) WHERE key = "ga_session_id") = 1732130821
GROUP BY ALL
ORDER BY 1 ASC
4. Traffic source-related event_params
The event parameters associated with traffic source information are identical to collected_traffic_source in terms of value but are harder to access and will, if event_params needs to be unnested for this reason alone, cost more to query.
Scope: event
Available: Since the start of the GA4 export (2019)
Present: on every event
GA4 UI: source, medium, campaign name
SQL Example: Proving that medium from collected_traffic_source and medium from event_params return are present on the same events.
SELECT
collected_traffic_source.manual_medium as cts_medium,
(SELECT value.string_value FROM unnest(event_params) WHERE key = "medium") as ep_medium,
count(*) as sessions
FROM `adswerve-data.analytics_423652181.events_20241101`
WHERE
event_name = "session_start"
GROUP BY ALL
ORDER BY 1 ASC
Please follow our technical insights blog content as we continue to dive into Google Analytics 4 BigQuery export and feel free to reach out on LinkedIn to engage in a discussion about this article.