Ready to Flex Your BigQuery Muscles?
Welcome to Adswerve's 31 Days of BigQuery, where each day in May brings a new mini challenge to sharpen your skills, stretch your brain, and maybe even spark a little friendly competition. Whether you’re a BigQuery beginner or a seasoned SQL wizard, there’s something here for you.
These challenges were thoughtfully crafted (and solved!) by Adswerve’s Technical Services team—the behind-the-scenes problem solvers who build, architect and scale data solutions for our clients every day. Now, we're sharing that expertise with you.
Each day in May, we will post a new 10-minute BigQuery challenge for you to solve, with the solution provided the following day. The challenges will cover a wide variety of BigQuery features and will become more difficult as the month progresses. Feel free to reach out via our social channels with any questions.
As with many data problems, most of the challenges may have more than one solution; we would be excited to hear if your solution differs from ours.
Day 1
Challenge:
For this first challenge, set up your own Google Cloud project, enable BigQuery, and query the public dataset. BigQuery is available in a free "sandbox" environment, limiting you to 10GB of storage and 1TB of monthly queries.
When ready, use standard SQL in the BigQuery studio to retrieve the top 5 most popular names in the US during the years 2010 and 2019 using the publicly available table `bigquery-public-data.usa_names.usa_1910_current`.
Tip
BigQuery comes with a large amount of publicly available data. I encourage you to explore these datasets; many have come in handy in the past.
Resources
https://cloud.google.com/bigquery/public-data
https://cloud.google.com/bigquery/docs/sandbox
https://console.cloud.google.com/marketplace/product/social-security-administration/us-names
https://www.w3schools.com/sql/
Solution
The 5 most popular names in the US during the period of 2010 to 2019 are Emma, Olivia, Noah, Sophia and Liam.
A possible SQL approach:
#1 Challenge
SELECT name, sum(number) as number
FROM `bigquery-public-data.usa_names.usa_1910_current`
WHERE year BETWEEN 2010 and 2019
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5
Day 2
On day two, we will explore an underutilized (personal opinion) feature hidden in the query output. While BigQuery allows you to open your query results in Looker Studio, Python notebooks, and data canvas, sometimes the quickest way to visualize is only a couple of clicks away in the "Chart" tab.
Challenge
In this challenge, build a chart using the public dataset (`bigquery-public-data.usa_names.usa_1910_current` ) from the first challenge that visualizes the popularity of your name and a few other names throughout the years. See the screenshot below as an example of the expected output.
Tip
You will notice that while the Chart function is easy to use, it comes with limitations, such as only allowing a single dimension, but allowing up to 5 measures.

Solution
Because the built-in charting solution only allows a single dimension and multiple measures, I added each of the observed names as a sum(if metric. Make sure to order by year for the visualization to be rendered as expected.
#2 Challenge
SELECT
year,
sum(if(name = "Luka", number, 0)) as Luka ,
sum(if(name = "Peter", number, 0)) as Peter,
sum(if(name = "Luca", number, 0)) as Luca ,
FROM `bigquery-public-data.usa_names.usa_1910_current`
GROUP BY ALL
ORDER BY year ASC
Day 3
Understanding query costs is crucial for managing your overall #BigQuery expenses. While the primary BQ charges are for storage and analysis (querying), advanced features like streaming inserts, BI Engine, Storage API, Notebook Runtimes, and BigQuery reservations can also contribute to costs as your usage evolves.
Challenge
Estimate the cost of the queries you wrote on days 1 and 2. In the on-demand pricing mode (most common), the cost is determined by the amount of data scanned for a query. Queries written in the UI (if valid) provide an estimate of how much data will be scanned in the query validator below the input window (see the screenshot).
Queries are charged at $6.25 per TiB or $0.00625 per GiB scanned.
Explore more
See how the price of queries changes with more or fewer columns selected, see what happens when using filters and limits. Think about how BigQuery’s columnar storage impacts the price of these queries.
Resources:
https://cloud.google.com/bigquery/docs/storage_overview
https://cloud.google.com/bigquery/pricing

Solution
The queries used in the previous challenges are both 143.71MB in size, which translates to 143.71MB => 0.00014371TB
0.00014371TB x $6.25/TB = $0.0008981875
Adding more columns increases the size of the queries, while using limit and filters does not impact the cost of the query due to BigQuery scanning each column fully.
Day 4
BigQuery supports table sharding as a way to divide your data, often seen with daily exports like Google Analytics 4 (events_YYYYMMDD). While you can query these daily tables individually, BigQuery lets you query them collectively using a wildcard (*) combined with the _TABLE_SUFFIX pseudo-column to filter specific date ranges.
Challenge
Using the public Google Analytics 4 dataset sample available at:
bigquery-public-data.ga4_obfuscated_sample_ecommerce run a query that provides the number of sessions (session_start events) between December 1 and December 31 2020.
Explore more
Keep in mind that while sharding is common in some exports, partitioning is generally the recommended approach in BigQuery for benefits like performance and easier management. We'll explore partitioning soon!
Resources
https://lnkd.in/d6tc5p8G
https://lnkd.in/daftABea
Solution
Check back on May 5th!