In this post, I’ll walk you through a few Ads Data Hub (ADH) queries that will help you get started quickly. If you’re not familiar with ADH yet, here’s a quick intro to the tool.
Attribution Modelling (with Paths)
When it comes to attribution, ADH offers two approaches. You can either use “raw” impressions, clicks, and activities tables which will require a bit more complex query, including multiple joins on a user/device identifier – or you can utilize the pre-built paths* tables that include click, view and floodlight events for each user over a 30-day window.
Below is an example of a simple query that examines the most popular combination of two actions before a conversion (FLOODLIGHT event type).
SELECT event_type, event_type_back1, event_type_back2, count(*) as frequency FROM( SELECT event_type, lag(event_type) OVER (PARTITION BY user_id ORDER BY event_time) event_type_back1, lag(event_type,2) OVER (PARTITION BY user_id ORDER BY event_time) event_type_back2 FROM adh.cm_paths as paths, unnest(path.events) as events) GROUP BY 1,2,3 ORDER BY 4 DESC
You may notice that I’m not filtering by event_type = “FLOODLIGHT”. As with other ADH queries, a good practice is to do additional filtering on the aggregated data in BigQuery (BQ) itself. In our case, filtering for event_type = “FLOODLIGHT” can be done in BQ. Having additional data also allows us to run analysis such as conversion rate per event_type combination.
SELECT event_type_back1, event_type_back2, sum(frequency) frequency, sum(IF(event_type = "FLOODLIGHT", frequency, 0)) as conversions, round(sum(IF(event_type = "FLOODLIGHT", frequency, 0))/sum(frequency)*100, 2) as convRate FROM `ap-adh.full_circle_shared.pathsblog` GROUP BY 1,2 ORDER BY 3 DESC
*Here’s a paths table schema
Often our clients are interested in campaign synergy, like how frequently conversions are a result of users seeing an ad from campaigns A and B, and comparing that performance to the two campaigns individually.
As a part of data privacy measurements, specific logical expressions in countif, such as campaignA.user_id is not null, will result in an error. That’s why we resorted to some basic Venn diagram calculations (A∩B = A + B – AUC).
To make things even easier, I added a flag to each of the three joining tables which ultimately allowed me to utilize countif.
SELECT *, round(conversionsUsersWhoSawA/usersSawA*100, 2) conversionRateA, round(conversionsUsersWhoSawB/usersSawB*100, 2) conversionRateB, round(conversionsUsersWhoSawAandB/usersSawAandB*100, 2) conversionRateAandB FROM( SELECT count(campaignA.user_id) usersSawA, count(campaignB.user_id) as usersSawB, count(campaignA.user_id )+count(campaignB.user_id)-count(*) usersSawAandB, countif(campaignAFlag=1 and conversionFlag=1) as conversionsUsersWhoSawA, countif(campaignBFlag=1 and conversionFlag=1) as conversionsUsersWhoSawB, countif(campaignAFlag=1 and campaignBFlag=1 and conversionFlag=1) as conversionsUsersWhoSawAandB, FROM( SELECT DISTINCT user_id, 1 campaignAFlag FROM adh.cm_dt_impressions WHERE event.campaign_id = 37 ) as campaignA FULL OUTER JOIN (SELECT DISTINCT user_id, 1 campaignBFlag FROM adh.cm_dt_impressions WHERE event.campaign_id = 2) as campaignB using(user_id) LEFT OUTER JOIN (SELECT DISTINCT user_id, 1 conversionFlag FROM adh.cm_dt_activities) as conversion using(user_id) )
In the case of sandbox data, the conversion rate by users who saw ads from campaign A and campaign B outperforms all users who saw A or B by about 15 percent. This is a pretty common observation. An analysis like this one can give you a better perspective on the synergy of campaigns.
Calculating the number of rows dropped
Due to privacy restrictions, more granular rows with less than 50 users (for impressions) are “bucketed” into a filtered row summary. This query will help you identify the number of rows that did not meet the privacy requirements.
We start by assigning each row a constant 1.
SELECT event.state as state, count(distinct user_id) as users, count(*) as impressions, 1 as numRows FROM adh.cm_dt_impressions GROUP BY 1
Along with the query make sure to always set up the “Filtered row summary” settings. In this case, we’re summing the numRows attribute.
As you can see below, the results of our query showed that 27 states did not meet the minimum privacy requirements and were bucketed together accounting for 4,778 impressions.
Here’s a quick overview of the process that happens in the background. ADH results in red are grouped together into a “filtered” row that’s then exported to BQ.
*Note that “rows” is a reserved keyword. To avoid any issues, we’re using numRows as the attribute
Adding/Joining First-Party User Data with u-variables
If your website uses a first-party user identifier, it is a good practice to send that value in a floodlight tag as a custom variable. In the query below, we’re joining our conversions table with a custom variable 10. Once our first-party data is connected with Campaign Manager conversions, user_id can be used to further connect our users with clicks and impressions.
WITH conversions AS ( SELECT user_id, revenue FROM `project_id.dataset_id.table_id` ) SELECT imp.account_id, imp.placement_id, conversions.revenue FROM ( SELECT events.placement_id, (REGEXP_EXTRACT(event.other_data, 'u10=([^;]+)') AS user_id, FROM adh.cm_dt_activities ) as imp INNER JOIN conversions ON conversions.user_id = imp.account_id
*Cookie matching is a more native way to tackle the problem of connecting first-party user information with data available in ADH. It requires sending a cookie value via a matching pixel. Match tables in ADH can then be accessed with a suffix “_match”. Match tables include an additional field (external_cookie).
Adding/Joining First-Party Device Id Data
Just like cookie values, resettable device ids can also be utilized to match ADH data with first-party collected values. Since device ids in ADH are hashed using MD5, internally collected RDIDs must first be hashed as well (see row 3). Same as with cookie matching, accessing device_id cannot be done through default tables and require adding a “_rdid” suffix. (Shout out to Adswerve Sr. Data Scientist Anže Kravanja and Data Engineer Pawel Kapuscinski for suggesting and contributing “join” queries!)
WITH conversions AS ( SELECT UPPER(TO_HEX(MD5(UPPER(device_id)))) as device_id_md5 FROM `project_id.dataset_id.table_id` ) SELECT smm.segment, COUNT(*) AS num_impressions, COUNT(DISTINCT imp.device_id_md5) AS unique_matched_maids FROM adh.cm_dt_impressions_rdid as imp INNER JOIN conversions ON conversions.device_id_md5 = imp.device_id_md5
And again, due to privacy restrictions, the results of the queries can only be seen in aggregate, when at least 50 unique users are present in a result row.
If you have any questions about using ADH, we can help! Contact us to learn more.