There are multiple factors that contribute to BigQuery cost. Generally, storage and processing (querying) tend to be the largest items on your bill at the end of the month. In this article, I will equip you with an easy way to analyze jobs (queries) within your organization.
Historically, analyzing processes in your BigQuery required several steps. This had to be done either via billing export, logging or the BigQuery API. With the introduction of jobs information schema, this has become a far easier task.
Depending on your permission levels, you will have access to different information schema scopes by:
Fields that I will use in my "quick-start" queries are:
The queries are written on a "project" scope. To access the information schema view you will have to reference it with the region (`region-us`) followed by the dataset "INFORMATION_SCHEMA", followed by the view, which depending on the scope can be USER, PROJECT, FOLDER or ORGANIZATION.
Query Cost by User Over the Last 30 Days
SELECT
user_email,
ROUND(SUM(total_bytes_billed)/1e12*5, 2) costInDollars
FROM
`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
AND CURRENT_TIMESTAMP()
AND job_type = "QUERY"
GROUP BY
1
ORDER BY
2 DESC
Note that sometimes users authenticate access to BigQuery in tools such as Tableau, Data Studio, local scripts, etc. You should be able to identify those by sometimes very specific SQL language (different job id format), however, the user running those queries/reports may not be the same as the user who authenticated BigQuery as a data source.
Size and SQL Code of the Most Expensive Query in the Last 30 Days
SELECT
user_email,
query,
total_bytes_billed
FROM
`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
AND CURRENT_TIMESTAMP()
AND job_type = "QUERY"
ORDER BY
total_bytes_billed DESC
LIMIT
1
Queries That ran the Longest
SELECT
query,
TIMESTAMP_DIFF(end_time, start_time, SECOND) AS run_time_in_seconds
FROM
`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
AND CURRENT_TIMESTAMP()
AND job_type = "QUERY"
ORDER BY
run_time DESC
LIMIT
5
Views that provide access to job history are not the only available information schema in BigQuery. Here is the full list:
Type | View Reference |
Datasets |
INFORMATION_SCHEMA.SCHEMATA INFORMATION_SCHEMA.SCHEMATA_OPTIONS |
Jobs |
INFORMATION_SCHEMA.JOBS_BY_USER INFORMATION_SCHEMA.JOBS_BY_PROJECT INFORMATION_SCHEMA.JOBS_BY_FOLDER INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION |
Jobs Timeline | INFORMATION_SCHEMA.JOBS_TIMELINE_BY_USER INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT INFORMATION_SCHEMA.JOBS_TIMELINE_BY_FOLDER INFORMATION_SCHEMA.JOBS_TIMELINE_BY_ORGANIZATION |
Reservations | INFORMATION_SCHEMA.RESERVATION_CHANGES_BY_PROJECT INFORMATION_SCHEMA.RESERVATIONS_BY_PROJECT INFORMATION_SCHEMA.CAPACITY_COMMITMENT_CHANGES_BY_PROJECT INFORMATION_SCHEMA.CAPACITY_COMMITMENTS_BY_PROJECT INFORMATION_SCHEMA.ASSIGNMENT_CHANGES_BY_PROJECT INFORMATION_SCHEMA.ASSIGNMENTS_BY_PROJECT |
Routines | INFORMATION_SCHEMA.ROUTINES |
Streaming | INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_FOLDER INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_ORGANIZATION |
Tables | INFORMATION_SCHEMA.TABLES |
Views | INFORMATION_SCHEMA.VIEWS |
For more information, review Jobs Information Schema Documentation or
Information Schema Intro. And remember you can contact us anytime.