BigQuery is an excellent data warehouse solution until you pay thousands of dollars for that. After you reach this situation, you start to wonder how to decrease BigQuery costs.
However, it’s not a BigQuery drawback but a push for customers to optimise queries and follow best practices for tables and SQL operations. After all, running better-optimised queries usually means making your analytics greener (fewer CO2 emissions) and more GDPR compliant.
Therefore, I collected all available solutions on the internet on how to decrease BigQuery costs in 2023. Thus, you can apply them today to reduce expenses tomorrow. Let’s start with the first one.
Check Query Cost Before Execution
The first is an essential element of any analytics work with the data warehouse. You should monitor how much data you work with or query using SQL.
Since BigQuery is one of the leading solutions, Google made it easy for you to know how much data you work with whenever you execute SQL queries. You can check the data amount in the console window’s top right corner.
And, when you start to play with your following queries to retrieve fewer data, you will understand that the four core elements of BigQuery are the following ones.
Avoid Select *
Firstly, you should avoid using “Select * FROM <Table>” because this way BigQuery reads data in the dataset and if your table has 10+ or even 25+ columns, you will pay for the data in every column even if you don’t need it.
Therefore, whenever you query the data, try using as few columns as possible. If you need to aggregate the data using COUNT(), SUM() or others, try to use the same column you use for grouping and calculations; for instance, if you are interested in the number of visits per country and one row represents one visit, then you can use “country” to group and count visits.
Forget Limit 100
As mentioned above, columns are the main element that regulates whether your billing increases when you work with BigQuery tables or decreases.
Therefore, you can forget the “Limit 100” that BigQuery adds by default. It doesn’t change your billing; you can query 100 rows or 1,000,000, but the billing will remain the same.
Look at my screenshot below and check your BigQuery console; you will find the same.
Preview Table Instead of Select * FROM <table>
Thirdly, avoid using “select * from <table>” just to preview the data. Whenever you do this, you pay Google for that. Instead of doing that, you can go to the table and click “Preview” there, and you can see all data and all columns and their names there.
If you are new to BigQuery, after you start to use “Preview”, you can find a significant decrease in your BigQuery billing.
Use INT64 columns for ORDER BY or JOIN
And fourthly, whenever you join or order by tables in BigQuery, try to use INT64 columns. Almost every data warehouse spends less execution time if the column is INT64.
Therefore, Microsoft SQL Server has an ID in every table. Because it’s much more efficient to join tables this way, it’s the same with BigQuery.
For instance, you can have two tables, one with purchases users made and another with users’ profiles (full name, email address, country, etc.). Even though you can have an email address in two tables, you shouldn’t use it for the join clause, use purchase reference or purchase id in this context and BigQuery rewards you with fewer costs for this query.
Pre-aggregate the table for reports and dashboards
Because analysts usually use BigQuery to prepare the report or prepare the dataset for the new dashboard, it’s also worth mentioning that it’s much better to pre-aggregate data or create a materialised view (read further to know more about it) instead of using multiple tables and connecting them all to Tableau or Google Looker Studio.
Therefore, pre-aggregating data using SQL statements before using it for the dashboard. If you aggregate it beforehand, you will pay less after since your data visualisation solution won’t operate with 1 or 100 gb tables but with 1 or 10 mb ones.
Returning to the example with purchases by countries I mentioned early in this, you can aggregate your table using SQL instead of joining Tableau or blending in Looker Studio.
Also, as a part of that, please always remember how frequently you want to update your dashboard. Many people are obsessed with real-time analytics though they can check your dashboard once per day or even once per week.
If your dashboard represents weekly data, updating it only once weekly and removing the non-full week from your dashboard can make sense and will decrease your BigQuery billings.
Use Incremental Table Update
So, you aggregate the table in BigQuery and even use scheduling queries to update it regularly, but the solution drops the table and creates a new one.
It’s a waste of resources! Instead of creating a table every time from scratch, you can insert new data into the table. It’s called incremental table update and requires fewer BigQuery capacities than overwriting the entire dataset.
Build Materialized Views
Though you can use incremental table updates for many tables, I encourage you to look at materialized views in BigQuery. This works as an incremental table update and can significantly decrease your billing.
The difference between using them and incremental table is small, but it’s worth mentioning them here since Google recommends using them for the following reasons:
- Zero maintenance. Materialized views are recomputed in the background when the base tables change. Any incremental data changes from the base tables are automatically added to the materialized views, with no user action required.
- Fresh data. Materialized views return fresh data. If changes to base tables might invalidate the materialized view, then data is read directly from the base tables. If the changes to the base tables do not invalidate the materialized view, then rest of the data is read from the materialized view and only the changes are read from the base tables.
- Smart tuning. If any part of a query against the source table can be resolved by querying the materialized view, then BigQuery reroutes the query to use the materialized view for better performance and efficiency.
Set up Table Expiration
Firstly, if you use only the last 90 days of data, it doesn’t make sense to pay for the entire dataset that includes the data for 5 years (even though Google charges you less for that – Long-term logical storage).
You can minimise your billing costs by reducing unused data and setting up a table expiration in BigQuery. Table expiration allows you to remove the old data automatically without spending time on that.
However, you can have legal reasons to keep 5 years’ data; therefore, table expiration doesn’t suit you, and it’s okay. In this case, you should use Partitioning & Clustering in BigQuery.
Use Partitioning & Clustering
Unlike Microsoft SQL Server or other data warehouse solutions, Google BigQuery doesn’t use the index in tables. Therefore, it can require more resources to load your query.
An excellent solution in BigQuery serves the same idea – partitioning or clustering. When you partition your table, you split it into a few buckets, usually by date or another column.
After you have a table with partitioning, you can use “WHERE” to query only the part of the dataset that interests you, for instance, only the one date. BigQuery won’t go through the whole dataset to query it but only this small bucket.
You can also make “WHERE” clause mandatory. This way, whenever someone from your team queries this table, they will query only the data they need, not the entire dataset.
If you use Google Analytics 4, you can find that the GA4 dataset follows this logic and the entire dataset is divided by date, where you can query the data about a specific date and pay only for that date.
Clustering works similarly but allows you to break the entire dataset into more granular buckets (clusters). For instance, you can split the table by date, country, city, or another table. You should know what columns are used the most as filters to make clustering successful.
Follow GDPR Data Minimisation
Though it’s not a thing you can do on BigQuery to reduce costs, it’s something important that you change in your analytics process. You can find it helpful to allocate some time to review what kind of data you store on BigQuery and whether this data is essential for your OKRs.
If you see that there is data that you used only once during the last 12 months or even did not use at all, why would you not remove it entirely from BigQuery?
GDPR minimisation policy clarifies that we should store only data that we need and can direct us. If you don’t use data that requires a vast store, drop this table or a part of it. It can optimise BigQuery costs.
Check Google Cloud Billing regularly
Obviously, costs can go high, but if you notice the increase early, you can address this faster. If you resolve it early, you can pay less as a result.
Therefore, it’s vital to check Google Cloud Billing once per week or even more frequently. You can find BigQuery costs in Google Cloud Navigation Menu -> Billing -> Reports.
Of course, checking this page regularly isn’t the best thing; therefore, Google introduced alerts.
Set up budget alerts in Google Cloud Billing
I do it myself, and I recommend you set up budget alerts. It’s easy to do, and if you reach 50%, 90% or 100%, Google will notify you by email.
In order to set up alerts, just go to Google Cloud Navigation Menu -> Billing -> Budgets & Alerts, click on “Create budget” and follow instructions.
Select the Right Pricing Model
Though it’s not the most common solution if your company doesn’t store a massive amount of data, once you start to store a lot of data in BigQuery, you can consider switching from on-demand pricing to flat-rate pricing.
The main benefit is that you can control the costs, and you can be sure that one query won’t go through the roof of the budget.
But, if you only start with BigQuery, you can look at Sandbox, which is free.
Use BigQuery Sandbox (for newbies, GA4 users)
If you are new to the BigQuery world and only learning how to use SQL or want to practice your SQL skills on the free datasets, you can create a sandbox and use limited BigQuery for free.
It’s also a good solution if you started to use Google Analytics 4 and connected it to BigQuery, just in case.
It allows you to store data for 60 days free of charge.
Though BigQuery is one of the leading data warehouses and considers to be one of the cheapest ones, you can get a vast billing if you don’t follow the best practices mentioned above.
To summarise, if you want to reduce the costs of BigQeury, you should follow these steps:
- Check Query Cost Before Execution
- Avoid Select *
- Forget Limit 100
- Preview Table Instead of Select * FROM table
- Use INT64 columns for ORDER BY or JOIN
- Pre-aggregate the table for reports and dashboards
- Use Incremental Table Update
- Build Materialised Views
- Set up Table Expiration
- Use Partitioning & Clustering
- Follow GDPR Data Minimisation
- Check Google Cloud Billing regularly
- Set up budget alerts in Google Cloud Billing
- Select the Right Pricing Model
- Use BigQuery Sandbox (for newbies, GA4 users)
If you have other ideas on optimising BigQuery costs, please let me know by commenting below.
Frequently Asked Questions
It depends on the query you execute, but you can check how many resources it will require in the top right corner in the BigQuery SQL console before executing it.
There are many ways how you can reduce BigQuery costs, some of them are:
1. Check Query Cost Before Execution
2. Avoid Select *
3. Forget Limit 100
4. Preview Table Instead of Select * FROM table
5. Use INT64 columns for ORDER BY or JOIN
6. Pre-aggregate the table for reports and dashboards
7. Use Incremental Table Update
8. Build Materialised Views
9. Set up Table Expiration
10. Use Partitioning & Clustering
11. Follow GDPR Data Minimisation
12. Check Google Cloud Billing regularly
13. Set up budget alerts in Google Cloud Billing
14. Select the Right Pricing Model
15. Use BigQuery Sandbox (for newbies, GA4 users)
BigQuery charges you based on the storage you use and the processing time of your queries. Therefore, the final costs can differ from case to case.
Yes, you can use BigQuery sandbox, which is free.