Five Ads Data Hub Queries to Get You Started


July 16, 2020

In this post, I’ll walk you through a few Ads Data Hub (ADH) queries that will help you get started quickly. If you’re not familiar with ADH yet, here’s a quick intro to the tool.

Attribution Modelling (with Paths)

When it comes to attribution, ADH offers two approaches. You can either use “raw” impressions, clicks, and activities tables which will require a bit more complex query, including multiple joins on a user/device identifier – or you can utilize the pre-built paths* tables that include click, view and floodlight events for each user over a 30-day window.

Below is an example of a simple query that examines the most popular combination of two actions before a conversion (FLOODLIGHT event type).

SELECT event_type, event_type_back1, event_type_back2, count(*) as frequency FROM(
  SELECT 
      event_type, 
      lag(event_type) OVER (PARTITION BY user_id ORDER BY event_time) event_type_back1, 
      lag(event_type,2) OVER (PARTITION BY user_id ORDER BY event_time)  event_type_back2 
 FROM adh.cm_paths as paths, unnest(path.events) as events)
GROUP BY 1,2,3
ORDER BY 4 DESC

You may notice that I’m not filtering by event_type = “FLOODLIGHT”. As with other ADH queries, a good practice is to do additional filtering on the aggregated data in BigQuery (BQ) itself. In our case, filtering for event_type = “FLOODLIGHT” can be done in BQ. Having additional data also allows us to run analysis such as conversion rate per event_type combination.


SELECT
  event_type_back1,
  event_type_back2,
  sum(frequency) frequency,
  sum(IF(event_type = "FLOODLIGHT", frequency, 0)) as conversions,
  round(sum(IF(event_type = "FLOODLIGHT", frequency, 0))/sum(frequency)*100, 2) as convRate
FROM `ap-adh.full_circle_shared.pathsblog`
GROUP BY 1,2
ORDER BY 3 DESC

*Here’s a paths table schema

Campaign Overlap

Often our clients are interested in campaign synergy, like how frequently conversions are a result of users seeing an ad from campaigns A and B, and comparing that performance to the two campaigns individually.

As a part of data privacy measurements, specific logical expressions in countif, such as campaignA.user_id is not null, will result in an error. That’s why we resorted to some basic Venn diagram calculations (A∩B = A + B – AUC).

To make things even easier, I added a flag to each of the three joining tables which ultimately allowed me to utilize countif.

SELECT *,
  round(conversionsUsersWhoSawA/usersSawA*100, 2) conversionRateA,
  round(conversionsUsersWhoSawB/usersSawB*100, 2) conversionRateB,
  round(conversionsUsersWhoSawAandB/usersSawAandB*100, 2) conversionRateAandB

  FROM(
    SELECT 
      count(campaignA.user_id) usersSawA, 
      count(campaignB.user_id) as usersSawB, 
      count(campaignA.user_id )+count(campaignB.user_id)-count(*) usersSawAandB,
      countif(campaignAFlag=1 and conversionFlag=1) as conversionsUsersWhoSawA,
      countif(campaignBFlag=1 and conversionFlag=1) as conversionsUsersWhoSawB,
      countif(campaignAFlag=1 and campaignBFlag=1 and conversionFlag=1) as conversionsUsersWhoSawAandB,
    FROM(
      SELECT DISTINCT
        user_id,
        1 campaignAFlag
      FROM adh.cm_dt_impressions
      WHERE event.campaign_id = 37
    ) as campaignA
    FULL OUTER JOIN
    (SELECT DISTINCT
      user_id,
      1 campaignBFlag
    FROM adh.cm_dt_impressions
    WHERE event.campaign_id = 2) as campaignB
    using(user_id)
    LEFT OUTER JOIN
    (SELECT DISTINCT  
      user_id,      
      1 conversionFlag FROM adh.cm_dt_activities) as conversion
    using(user_id)
)

In the case of sandbox data, the conversion rate by users who saw ads from campaign A and campaign B outperforms all users who saw A or B by about 15 percent. This is a pretty common observation. An analysis like this one can give you a better perspective on the synergy of campaigns.

Calculating the number of rows dropped

Due to privacy restrictions, more granular rows with less than 50 users (for impressions) are “bucketed” into a filtered row summary. This query will help you identify the number of rows that did not meet the privacy requirements.

We start by assigning each row a constant 1.


SELECT
event.state as state,
count(distinct user_id) as users,
count(*) as impressions, 1 as numRows
FROM adh.cm_dt_impressions
GROUP BY 1

Along with the query make sure to always set up the “Filtered row summary” settings. In this case, we’re summing the numRows attribute.

As you can see below, the results of our query showed that 27 states did not meet the minimum privacy requirements and were bucketed together accounting for 4,778 impressions.

Here’s a quick overview of the process that happens in the background. ADH results in red are grouped together into a “filtered” row that’s then exported to BQ.

*Note that “rows” is a reserved keyword. To avoid any issues, we’re using numRows as the attribute

Adding/Joining First-Party User Data with u-variables

If your website uses a first-party user identifier, it is a good practice to send that value in a floodlight tag as a custom variable. In the query below, we’re joining our conversions table with a custom variable 10. Once our first-party data is connected with Campaign Manager conversions, user_id can be used to further connect our users with clicks and impressions.


WITH conversions AS (
  SELECT user_id, revenue
  FROM `project_id.dataset_id.table_id`
)

SELECT
  imp.account_id,
  imp.placement_id,
  conversions.revenue
FROM (
SELECT
  events.placement_id,
  (REGEXP_EXTRACT(event.other_data, 'u10=([^;]+)') AS user_id,
FROM
  adh.cm_dt_activities
) as imp
INNER JOIN conversions
ON conversions.user_id = imp.account_id

*Cookie matching is a more native way to tackle the problem of connecting first-party user information with data available in ADH. It requires sending a cookie value via a matching pixel. Match tables in ADH can then be accessed with a suffix “_match”.  Match tables include an additional field (external_cookie).

Adding/Joining First-Party Device Id Data

Just like cookie values, resettable device ids can also be utilized to match ADH data with first-party collected values. Since device ids in ADH are hashed using MD5, internally collected RDIDs must first be hashed as well (see row 3). Same as with cookie matching, accessing device_id cannot be done through default tables and require adding a “_rdid” suffix. (Shout out to Adswerve Sr. Data Scientist Anže Kravanja and Data Engineer Pawel Kapuscinski for suggesting and contributing “join” queries!)


WITH conversions AS (
  SELECT 
    UPPER(TO_HEX(MD5(UPPER(device_id)))) as device_id_md5
  FROM `project_id.dataset_id.table_id`
)

SELECT
  smm.segment,
  COUNT(*) AS num_impressions,
  COUNT(DISTINCT imp.device_id_md5) AS unique_matched_maids
FROM
  adh.cm_dt_impressions_rdid as imp
INNER JOIN conversions
ON conversions.device_id_md5 = imp.device_id_md5

And again, due to privacy restrictions, the results of the queries can only be seen in aggregate, when at least 50 unique users are present in a result row.

If you have any questions about using ADH, we can help! Contact us to learn more.