Take a quick demo tour of Connect to see the added benefits you’ll have when you partner with Adswerve. Get access to hundreds of training videos, the latest industry news, your account and billing information and proprietary tools and applications to help you get more done—plus, so much more!
Subscribe to our monthly newsletter to get the latest updates in your inbox
Saving money is essential to almost everything we do. So when it comes to BigQuery — a data warehouse that is part of the Google Cloud Platform — it’s not surprising that saving money on BigQuery storage can be a game-changer.
As you may already know, BigQuery’s potential to turn big data into calculated business decisions is seemingly limitless. It allows you to load your data from various platforms into one centralized data warehouse, where you can then discover significant insights through in-depth data analysis.
While these projects allow you to query massive datasets in seconds to uncover user behavior patterns that are hard to notice in standard reporting, they also collect lots of data in storage as time goes on. And while the cost of storage per GB will stay consistent (and inexpensive), the total BigQuery storage cost can increase. To help keep those costs down, here are seven tips for saving money on BigQuery storage.
Tip 1: Estimate
Tip 2: Clean
Do some “spring cleaning” in the project and datasets to find any tables or partitions you can delete or archive. Basically, don’t be a digital hoarder. For example, delete temporary tables you created for EDA (exploratory data analysis) if you don’t need them anymore.
Tip 3: Keep and partition
We recommend that you keep your data in BigQuery and partition your tables.
A partitionedtable is a special table divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can:
Improve query performance
Control query costs by reducing the number of bytes read by a query, and
Save on BigQuery storage cost
If you have a table or a partition not edited for 90 consecutive days, the price of BigQuery storage for that table automatically drops by 50 percent to $0.01 per GB per month because of Big Query’s long-term storage pricing. In addition to savings, long-term storage ensures that:
There is no performance degradation, durability, availability, or any other functionality when a table or partition is considered for long-term storage.
Querying the table data, along with a few other actions, does not reset the 90-day timer and the pricing continues to be considered as long-term storage.
If you have 100 GB of data in BigQuery active storage, typically, it will cost you $1.80 per month.
If your tables are partitioned, you can save on storage.
If you haven’t modified 2/3 of your data for 90 days, you will save 43% on BigQuery storage.
If you haven’t modified 4/5 (or 80%) of your data for 90 days, you will save 50%.
Example A: 67% of BigQuery data is in long-term storage for 43% savings:
Example B: 80% of BigQuery data is in long-term storage for 50% savings:
Taking advantage of BigQuery’s long-term storage should give you significant savings. If you want to take it further, see the following Google Cloud Storage (GCS) archiving steps.
Tip 4: Expire
Use the expiration settings to remove unneeded tables and partitions.
You can also query GCS as an external data source without loading it into BigQuery.
An external data source is a data source that you can query directly even though the data is not stored in BigQuery. Instead of loading the data, you create a table that references the external data source.
BigQuery offers support for querying data directly from Cloud Storage.
Supported formats include CSV and JSON.
You can also maintain and query externally hive partitioned tables, which allow you to retain partitioning in external GCS data, making your query cost lower.
Below is a design of a solution that will archive your BigQuery data in GCS.
Cloud Scheduler sends a daily Pub/Sub message.
Pub/Sub message triggers a Cloud Function.
The Cloud Function copies yesterday’s Gzipped data (as CSV or JSON) partition to the GCS.
It also can backfill data prior to yesterday.
Once the BigQuery partition is archived in GCS, it expires in BigQuery.
Tip 7: Monitor
We suggest using the Cloud Billing Console to help you monitor how your storage and access costs changed after archiving. This will also allow you to adjust your savings strategy as needed.
Learn more:
Check out these helpful resources for more information about how you can cut your BigQuery costs:
Interested in getting started in BigQuery? Adswerve is offering a $500/month credit to any of our existing or new Google Analytics 360 subscribers interested in leveraging the Google Cloud and Google Marketing Platforms together to meet their marketing goals. Contact us to learn more.