As mentioned in our previous post, “Google Analytics BigQuery Export Part One: Why Export Google Analytics Data?"
Google Analytics BigQuery Export Part One: Why Export Google Analytics Data?”,
beginning to work on GA data can be difficult as there are nuances to the way it’s stored. We’ve put together a list of recipes you can hopefully use to jumpstart even greater analysis from your GA BQ export.
All the examples below require modifying the from clause as follows to your specific project and date range.
- <DataSetId> with the desired dataset id
- <ProjectId> with the project id the dataset belongs to in #1 (above)
- YYYYMMDD with desired year (YYYY), Month (MM) and Date (DD)
The ga_sessions_YYYYMMDD table has schema documented
here
where each and every column (field) is explained in detail.
Recipe: GA Basics
GA BQ export is not just for complex analysis. It’s also capable of reproducing any metrics and reports for your custom dashboards. For example, the query below calculates metrics:
- Users
- Visits
- Pageviews
- Bounces
- Transactions
- Revenue
- Revenue per Visit
For the following dimensions:
- Channel (traffic medium)
- Device Category
- Date
During August 1 & 2, 2016 for the
sample Google Analytics dataset
provided by Google.
Go ahead and try it out using the SQL recipe below and compare to your current GA session reports. You may have to adjust the fixed date range from 1-aug-2016 to 2-aug-2018 to something else specific to your website or use the dynamic date range (commented out with ‘/ * …. */’. In case you find that the user metric totals are different, read this
post
.
#standardSQL
SELECT
channelGrouping AS traffic_type,
device.deviceCategory as device_type,
date AS Aug_2016_day,
_TABLE_SUFFIX AS Aug_2016_day2,
COUNT(DISTINCT fullVisitorId) AS Users,
SUM(totals.visits) AS visits,
SUM(totals.pageviews) AS pageviews,
SUM(totals.bounces) AS bounces,
SUM(totals.transactions) AS transactions,
SUM(totals.transactionRevenue)/1000000 AS revenue,
ROUND((SUM(totals.transactionRevenue)/1000000)/(SUM(totals.visits)),2) AS revenue_per_visit
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20160802'
/* another way to filter date range for a rolling date range, like past 30 days */
/*
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
AND
FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
*/
GROUP BY
channelGrouping, device_type, date, _TABLE_SUFFIX
ORDER BY
channelGrouping DESC, device_type desc, date ASC, _TABLE_SUFFIX ASC
The main benefits of exporting GA data to BQ is for more details and exhaustive queries to produce information not available in GA or through the many data connectors (APIs, Super Metrics, Tableau data connectors, etc.) that only retrieve aggregated data and not hit-level records (page views, events, social, products, etc.).
Every row in either the ga_sessions_yyyymmdd or the ga_sessions_intraday_yyyymmdd tables represents a session. A session can often contain multiple pageviews and events or other interactions. These interactions are stored as hits within the session. Hits contain a lot of information like page path, title, custom dimensions, etc. and come in the form of a record.
Since multiple hits are contained within a session, the hit record is defined as repeated. So in order to see every hit as a single row in your query results, that record must be unnested using the UNNEST function. Take note of both the hits and customDimensions record in screenshot of the schema below. They both have the word REPEATED. Repeated fields must be unnested to have them appear in each and every row and not contained (nested) within a row.
Recipe: Promotion Performance
The following query unnests the hit repeated record and then promotion repeated record within each hit. The result calculates and reports promotion metrics for each promotion and the position it’s in. The time range is August 2016.
SELECT
promo.promoName AS Promotion_Name,
promo.promoCreative AS Promotion_Type,
promo.promoPosition as Promotion_Position,
COUNT(hit.promotionActionInfo.promoIsView) AS Promotion_Views,
COUNT(hit.promotionActionInfo.promoIsClick) AS Promotion_Clicks,
ROUND((COUNT(hit.promotionActionInfo.promoIsClick)
/COUNT(hit.promotionActionInfo.promoIsView))*100,2) AS Promotion_CTR
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) AS hit,
UNNEST(hit.promotion) AS promo
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20160831' -- date range
GROUP BY
Promotion_Name, Promotion_Type, Promotion_Position
HAVING
Promotion_Views > Promotion_Clicks -- longtail cleanup
AND Promotion_Views > 100 --threshold
ORDER BY
Promotion_CTR DESC
Recipe: Hit Scoped Custom Dimensions
The following query will retrieve all hit-scoped custom dimensions for each page path in a session in long table orientation for the past 30 days.
#standardSQL
SELECT
CONCAT(fullVisitorId, ".", CAST(visitId AS STRING)) AS Session_ID,
hit.page.pagePath AS Page,
cd.index AS cd_index,
cd.value AS cd_value
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*` session,
UNNEST(session.hits) AS hit,
UNNEST(hit.customDimensions) AS cd
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
Recipe: Reproducing goals
If there is any downside to GA data exported to BQ it is that goals are not exported. That means you need to replicate the logic used to configure goals in GA within your BQ query.
The query below looks for specific events to count the number of leads a website receives. You’ll have to change the REGEXP_CONTAINS parameter values as per your event hierarchy to extract the goals you’re interested in.
SELECT
CASE
WHEN REGEXP_CONTAINS(hits.eventInfo.eventCategory, r'lead') AND REGEXP_CONTAINS(hits.eventInfo.eventAction, r'trial_signup|demo_request|contact_us') THEN "Web Leads"
ELSE "null"
END AS goal2,
date,
count (DISTINCT CONCAT(fullVisitorId, ".", CAST(visitId AS STRING))) AS Unique_sessions
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) AS hits
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY
goal2, date
ORDER BY
goal2
Recipe: Dates and Times
There are many date and times available in the GA BQ dataset. These can be used for lookback or window based analytical function reporting, similar to segments in GA but far more scalable and open. Below is a query showcasing the different date/times and time-zones available in GA exports to BQ. The query has a hard-coded time range of April 1-4, 2017.
#standardSQL
SELECT
_TABLE_SUFFIX AS table_suffix_date,
visitStartTime AS Session_Start_Time_in_epoch_UTC, --number of seconds since Jan 1/1970
TIMESTAMP_SECONDS(visitStartTime) AS Session_Start_Time_in_UTC,
TIMESTAMP_SECONDS(visitStartTime + CAST(hit.time/1000 AS int64)) AS Hit_Time_in_UTC,
DATETIME(TIMESTAMP_SECONDS(visitStartTime + CAST(hit.time/1000 AS int64)), "PST8PDT" ) AS Hit_Time_in_GAViewTimezone, --also use MST7MDT, CST6CDT, EST5EDT as desired
hit.time AS hit_seconds_since_first_hit,
hit.hour AS hour_hit_occured_in_GAView_TimeZone,
hit.minute AS minute_hit_occured,
hit.hitNumber AS hit_number_of_session,
hit.type AS hit_type
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`, /* change this to be client specifc <project>.<dataset_id>.<table_name> */
UNNEST(hits) AS hit
WHERE
_TABLE_SUFFIX BETWEEN '20170401' AND '20170404'
ORDER BY
table_suffix_date ASC,
visitStartTime ASC,
hit_number_of_session ASC
What's Next?
Exporting session and hit data from a GA 360 account to BQ allows for limitless possibilities. The raw data logs is what you now have stored in a data warehouse and using SQL-like syntax to query it. When you export data to BQ, you own that data and can do with it what you please, including integrating with other data sets.
Hopefully, this gives you a springboard to begin more advanced analysis on your hit-based data.
We guarantee the data is there to produce the desired information, but
please post in the comments any ways that you are using these queries or trouble you are having. And please feel free to
contact us
with any questions.