Technical Insights

All About Jobs Information Schema and BiqQuery Processing Costs

Written by Luka Cempre | Mar 16, 2021 6:00:00 AM

There are multiple factors that contribute to BigQuery cost. Generally, storage and processing (querying) tend to be the largest items on your bill at the end of the month. In this article, I will equip you with an easy way to analyze jobs (queries) within your organization.

Jobs Information Schema

Historically, analyzing processes in your BigQuery required several steps. This had to be done either via billing export, logging or the BigQuery API. With the introduction of jobs information schema, this has become a far easier task.

Depending on your permission levels, you will have access to different information schema scopes by:

  • user (view your own jobs)
  • project (view all the jobs in a project)
  • folder (view all the jobs for the projects in the folder)
  • organization (view all the jobs in the organization)

    No matter the level of access, the schema between those views will stay the same and can be reviewed here. Note that some columns are either partitioned (creation_time) or clustered (project_id, user_email). Take advantage of those to make your queries more efficient.

Fields of Interest

Fields that I will use in my "quick-start" queries are:

  • creation_time (creation time of the job/query)
  • user_email (email of the user who ran the job/query)
  • job_type (a type of a job, to differentiate between queries, loads, copies, etc.)
  • start_time and end_time (start and end time of a job, to calculate the query run time)
  • query (input text of a query, to identify which query exactly ran the longest, cost the most, etc.)
  • total_bytes_billed (bytes that were billed to calculate the cost of a query*)

    * In BigQuery, the processing cost is calculated based on the amount of data scanned (unless slot reservation is used). After the initial free tier, the cost of querying is $5 per TB

BQ Processing Cost Queries

The queries are written on a "project" scope. To access the information schema view you will have to reference it with the region (`region-us`) followed by the dataset "INFORMATION_SCHEMA", followed by the view, which depending on the scope can be USER, PROJECT, FOLDER or ORGANIZATION.

Query Cost by User Over the Last 30 Days

SELECT
  user_email,
  ROUND(SUM(total_bytes_billed)/1e12*5, 2) costInDollars
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
  creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
  AND CURRENT_TIMESTAMP()
  AND job_type = "QUERY"
GROUP BY
  1
ORDER BY
  2 DESC  

 

Note that sometimes users authenticate access to BigQuery in tools such as Tableau, Data Studio, local scripts, etc. You should be able to identify those by sometimes very specific SQL language (different job id format), however, the user running those queries/reports may not be the same as the user who authenticated BigQuery as a data source.

Size and SQL Code of the Most Expensive Query in the Last 30 Days

SELECT
  user_email,
  query,
  total_bytes_billed
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
  creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
  AND CURRENT_TIMESTAMP()
  AND job_type = "QUERY"
ORDER BY
  total_bytes_billed DESC
LIMIT
  1

 

 

Queries That ran the Longest

SELECT
  query,
  TIMESTAMP_DIFF(end_time, start_time, SECOND) AS run_time_in_seconds
FROM
  `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
  creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
  AND CURRENT_TIMESTAMP()
  AND job_type = "QUERY"
ORDER BY
  run_time DESC
LIMIT
  5

 

Other Information Schema Views

Views that provide access to job history are not the only available information schema in BigQuery. Here is the full list:

Type View Reference
Datasets
INFORMATION_SCHEMA.SCHEMATA
INFORMATION_SCHEMA.SCHEMATA_OPTIONS
Jobs
INFORMATION_SCHEMA.JOBS_BY_USER
INFORMATION_SCHEMA.JOBS_BY_PROJECT
INFORMATION_SCHEMA.JOBS_BY_FOLDER
INFORMATION_SCHEMA.JOBS_BY_ORGANIZATION
Jobs Timeline INFORMATION_SCHEMA.JOBS_TIMELINE_BY_USER
INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT
INFORMATION_SCHEMA.JOBS_TIMELINE_BY_FOLDER
INFORMATION_SCHEMA.JOBS_TIMELINE_BY_ORGANIZATION
Reservations INFORMATION_SCHEMA.RESERVATION_CHANGES_BY_PROJECT
INFORMATION_SCHEMA.RESERVATIONS_BY_PROJECT
INFORMATION_SCHEMA.CAPACITY_COMMITMENT_CHANGES_BY_PROJECT
INFORMATION_SCHEMA.CAPACITY_COMMITMENTS_BY_PROJECT
INFORMATION_SCHEMA.ASSIGNMENT_CHANGES_BY_PROJECT
INFORMATION_SCHEMA.ASSIGNMENTS_BY_PROJECT
Routines INFORMATION_SCHEMA.ROUTINES
Streaming INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_PROJECT
INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_FOLDER
INFORMATION_SCHEMA.STREAMING_TIMELINE_BY_ORGANIZATION
Tables INFORMATION_SCHEMA.TABLES
Views INFORMATION_SCHEMA.VIEWS

 

Reference

For more information, review Jobs Information Schema Documentation or
Information Schema Intro. And remember you can contact us anytime.