There are multiple factors that contribute to BigQuery cost. Generally, storage and processing (querying) tend to be the largest items on your bill at the end of the month. In this article, I will equip you with an easy way to analyze jobs (queries) within your organization.
Historically, analyzing processes in your BigQuery required several steps. This had to be done either via billing export, logging or the BigQuery API. With the introduction of jobs information schema, this has become a far easier task.
Depending on your permission levels, you will have access to different information schema scopes by:
Fields that I will use in my "quick-start" queries are:
The queries are written on a "project" scope. To access the information schema view you will have to reference it with the region (`region-us`) followed by the dataset "INFORMATION_SCHEMA", followed by the view, which depending on the scope can be USER, PROJECT, FOLDER or ORGANIZATION.
Query Cost by User Over the Last 30 Days
[sourcecode lang="sql"] SELECT user_email, ROUND(SUM(total_bytes_billed)/1e12*5, 2) costInDollars FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) AND CURRENT_TIMESTAMP() AND job_type = "QUERY" GROUP BY 1 ORDER BY 2 DESC[/sourcecode]
Note that sometimes users authenticate access to BigQuery in tools such as Tableau, Data Studio, local scripts, etc. You should be able to identify those by sometimes very specific SQL language (different job id format), however, the user running those queries/reports may not be the same as the user who authenticated BigQuery as a data source.
Size and SQL Code of the Most Expensive Query in the Last 30 Days
[sourcecode lang="sql"] SELECT user_email, query, total_bytes_billed FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) AND CURRENT_TIMESTAMP() AND job_type = "QUERY" ORDER BY total_bytes_billed DESC LIMIT 1 [/sourcecode]
Queries That ran the Longest
[sourcecode lang="sql"] SELECT query, TIMESTAMP_DIFF(end_time, start_time, SECOND) AS run_time_in_seconds FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY) AND CURRENT_TIMESTAMP() AND job_type = "QUERY" ORDER BY run_time DESC LIMIT 5 [/sourcecode]
Views that provide access to job history are not the only available information schema in BigQuery. Here is the full list:
Type | View Reference |
Datasets |
INFORMATION_SCHEMA.SCHEMATA INFORMATION_SCHEMA.SCHEMATA_OPTIONS |
Jobs |
INFORMATION_SCHEMA.JOBS_BY_USER INFORMATION_SCHEMA.JOBS_BY_PROJECT INFORMATION_SCHEMA.JOBS_BY_FOLDER INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION |
Jobs Timeline | INFORMATION_SCHEMA.JOBS_TIMELINE_BY_USER INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT INFORMATION_SCHEMA.JOBS_TIMELINE_BY_FOLDER INFORMATION_SCHEMA.JOBS_TIMELINE_BY_ORGANIZATION |
Reservations | INFORMATION_SCHEMA.RESERVATION_CHANGES_BY_PROJECT INFORMATION_SCHEMA.RESERVATIONS_BY_PROJECT INFORMATION_SCHEMA.CAPACITY_COMMITMENT_CHANGES_BY_PROJECT INFORMATION_SCHEMA.CAPACITY_COMMITMENTS_BY_PROJECT INFORMATION_SCHEMA.ASSIGNMENT_CHANGES_BY_PROJECT INFORMATION_SCHEMA.ASSIGNMENTS_BY_PROJECT |
Routines | INFORMATION_SCHEMA.ROUTINES |
Streaming | INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_FOLDER INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_ORGANIZATION |
Tables | INFORMATION_SCHEMA.TABLES |
Views | INFORMATION_SCHEMA.VIEWS |
For more information, review Jobs Information Schema Documentation or
Information Schema Intro. And remember you can contact us anytime.