Estimate and control costs

This page describes best practices for estimating and controlling costs in BigQuery.

The primary costs in BigQuery are compute, used for query processing, and storage, for data that is stored in BigQuery. BigQuery offers two types of pricing models for query processing, on-demand and capacity-based pricing. Each model offers different best practices for cost control. For data stored in BigQuery , costs depend on the storage billing model configured for each dataset.

Understand compute pricing for BigQuery

There are subtle differences in compute pricing for BigQuery that affect capacity planning and cost control.

Pricing models

For on-demand compute in BigQuery, you incur charges per TiB for BigQuery queries.

Alternatively, for capacity compute in BigQuery, you incur charges for the compute resources ( slots ) that are used to process the query. To use this model, you configure reservations for slots.

Reservations have the following features:

They are allocated in pools of slots, and they let you manage capacity and isolate workloads in ways that make sense for your organization.
They must reside in one administration project and are subject to quotas and limits .

The capacity pricing model offers several editions , which all offer a pay-as-you-go option that's charged in slot hours. Enterprise and Enterprise Plus editions also provide optional one- or three-year slot commitments that can save money over the pay-as-you-go rate.

You can also set autoscaling reservations using the pay-as-you-go option. For more information, see the following:

To compare pricing models, see Choosing a model .
For pricing details, see On-demand compute pricing and Capacity compute pricing .

Restrict costs for each model

When you use the on-demand pricing model, the only way to restrict costs is to configure project-level or user-level daily quotas. However, these quotas enforce a hard cap that prevents users from running queries beyond the quota limit. To set quotas, see Create custom query quotas .

When you use the capacity pricing model using slot reservations, you specify the maximum number of slots that are available to a reservation. You can also purchase slot commitments that provide discounted prices for a committed period of time.

You can use editions fully on demand by setting the baseline of the reservation to 0 and the maximum to a setting that meets your workload needs. BigQuery automatically scales up to the number of slots needed for your workload, never exceeding the maximum that you set. For more information, see Workload management using reservations .

Control query costs

To control the costs of individual queries, we recommend that you first follow best practices for optimizing query computation and optimizing storage .

The following sections outline additional best practices that you can use to further control your query costs.

Create custom query quotas

Best practice:Use custom daily query quotas to limit the amount of data processed per day.

You can manage costs by setting a custom quota that specifies a limit on the amount of data processed per day per project or per user. Users are not able to run queries once the quota is reached.

To set a custom quota, you need specific roles or permissions . For quotas to set, see Quotas and limits .

For more information, see Restrict costs for each pricing model .

Check the estimated cost before running a query

Best practice:Before running queries, preview them to estimate costs.

When using the on-demand pricing model, queries are billed according to the number of bytes read. To estimate costs before running a query:

Use the query validator in the Google Cloud console.
Perform a dry run for queries.

Use the query validator

When you enter a query in the Google Cloud console, the query validator verifies the query syntax and provides an estimate of the number of bytes read. You can use this estimate to calculate query cost in the pricing calculator.

If your query is not valid, then the query validator displays an error message. For example:

Not found: Table myProject:myDataset.myTable was not found in location US
If your query is valid, then the query validator provides an estimate of the number of bytes required to process the query. For example:

This query will process 623.1 KiB when run.

Perform a dry run

To perform a dry run, do the following:

Console

Go to the BigQuery page.

Go to BigQuery
Enter your query in the query editor.

If the query is valid, then a check mark automatically appears along with the amount of data that the query will process. If the query is invalid, then an exclamation point appears along with an error message.

bq

Enter a query like the following using the --dry_run flag.

bq  
query  
 \ 
--use_legacy_sql = 
 false 
  
 \ 
--dry_run  
 \ 
 'SELECT 
 COUNTRY, 
 AIRPORT, 
 IATA 
 FROM 
 ` project_id 
`. dataset 
.airports 
 LIMIT 
 1000'

For a valid query, the command produces the following response:

Query successfully validated. Assuming the tables are not modified,
running this query will process 10918 bytes of data.

API

To perform a dry run by using the API, submit a query job with dryRun set to true in the JobConfiguration type.

Go

Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries . For more information, see the BigQuery Go API reference documentation .

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .

  import 
  
 ( 
  
 "context" 
  
 "fmt" 
  
 "io" 
  
 "cloud.google.com/go/bigquery" 
 ) 
 // queryDryRun demonstrates issuing a dry run query to validate query structure and 
 // provide an estimate of the bytes scanned. 
 func 
  
 queryDryRun 
 ( 
 w 
  
 io 
 . 
 Writer 
 , 
  
 projectID 
  
 string 
 ) 
  
 error 
  
 { 
  
 // projectID := "my-project-id" 
  
 ctx 
  
 := 
  
 context 
 . 
 Background 
 () 
  
 client 
 , 
  
 err 
  
 := 
  
 bigquery 
 . 
 NewClient 
 ( 
 ctx 
 , 
  
 projectID 
 ) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 return 
  
 fmt 
 . 
 Errorf 
 ( 
 "bigquery.NewClient: %v" 
 , 
  
 err 
 ) 
  
 } 
  
 defer 
  
 client 
 . 
 Close 
 () 
  
 q 
  
 := 
  
 client 
 . 
 Query 
 ( 
 ` 
 SELECT 
 name, 
 COUNT(*) as name_count 
 FROM ` 
  
 + 
  
 "`bigquery-public-data.usa_names.usa_1910_2013`" 
  
 + 
  
 ` 
 WHERE state = 'WA' 
 GROUP BY name` 
 ) 
  
 q 
 . 
 DryRun 
  
 = 
  
 true 
  
 // Location must match that of the dataset(s) referenced in the query. 
  
 q 
 . 
  Location 
 
  
 = 
  
 "US" 
  
 job 
 , 
  
 err 
  
 := 
  
 q 
 . 
 Run 
 ( 
 ctx 
 ) 
  
 if 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 return 
  
 err 
  
 } 
  
 // Dry run is not asynchronous, so get the latest status and statistics. 
  
 status 
  
 := 
  
 job 
 . 
  LastStatus 
 
 () 
  
 if 
  
 err 
  
 := 
  
 status 
 . 
  Err 
 
 (); 
  
 err 
  
 != 
  
 nil 
  
 { 
  
 return 
  
 err 
  
 } 
  
 fmt 
 . 
 Fprintf 
 ( 
 w 
 , 
  
 "This query will process %d bytes\n" 
 , 
  
 status 
 . 
  Statistics 
 
 . 
 TotalBytesProcessed 
 ) 
  
 return 
  
 nil 
 }

Java

Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries . For more information, see the BigQuery Java API reference documentation .

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .

  import 
  
 com.google.cloud.bigquery. BigQuery 
 
 ; 
 import 
  
 com.google.cloud.bigquery. BigQueryException 
 
 ; 
 import 
  
 com.google.cloud.bigquery. BigQueryOptions 
 
 ; 
 import 
  
 com.google.cloud.bigquery. Job 
 
 ; 
 import 
  
 com.google.cloud.bigquery. JobInfo 
 
 ; 
 import 
  
 com.google.cloud.bigquery. JobStatistics 
 
 ; 
 import 
  
 com.google.cloud.bigquery. QueryJobConfiguration 
 
 ; 
 // Sample to run dry query on the table 
 public 
  
 class 
 QueryDryRun 
  
 { 
  
 public 
  
 static 
  
 void 
  
 runQueryDryRun 
 () 
  
 { 
  
 String 
  
 query 
  
 = 
  
 "SELECT name, COUNT(*) as name_count " 
  
 + 
  
 "FROM `bigquery-public-data.usa_names.usa_1910_2013` " 
  
 + 
  
 "WHERE state = 'WA' " 
  
 + 
  
 "GROUP BY name" 
 ; 
  
 queryDryRun 
 ( 
 query 
 ); 
  
 } 
  
 public 
  
 static 
  
 void 
  
 queryDryRun 
 ( 
 String 
  
 query 
 ) 
  
 { 
  
 try 
  
 { 
  
 // Initialize client that will be used to send requests. This client only needs to be created 
  
 // once, and can be reused for multiple requests. 
  
  BigQuery 
 
  
 bigquery 
  
 = 
  
  BigQueryOptions 
 
 . 
 getDefaultInstance 
 (). 
 getService 
 (); 
  
  QueryJobConfiguration 
 
  
 queryConfig 
  
 = 
  
  QueryJobConfiguration 
 
 . 
 newBuilder 
 ( 
 query 
 ). 
  setDryRun 
 
 ( 
 true 
 ). 
 setUseQueryCache 
 ( 
 false 
 ). 
 build 
 (); 
  
  Job 
 
  
 job 
  
 = 
  
 bigquery 
 . 
  create 
 
 ( 
 JobInfo 
 . 
 of 
 ( 
 queryConfig 
 )); 
  
  JobStatistics 
 
 . 
  QueryStatistics 
 
  
 statistics 
  
 = 
  
 job 
 . 
 getStatistics 
 (); 
  
 System 
 . 
 out 
 . 
 println 
 ( 
  
 "Query dry run performed successfully." 
  
 + 
  
 statistics 
 . 
  getTotalBytesProcessed 
 
 ()); 
  
 } 
  
 catch 
  
 ( 
  BigQueryException 
 
  
 e 
 ) 
  
 { 
  
 System 
 . 
 out 
 . 
 println 
 ( 
 "Query not performed \n" 
  
 + 
  
 e 
 . 
 toString 
 ()); 
  
 } 
  
 } 
 }

Node.js

Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries . For more information, see the BigQuery Node.js API reference documentation .

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .

  // Import the Google Cloud client library 
 const 
  
 { 
 BigQuery 
 } 
  
 = 
  
 require 
 ( 
 ' @google-cloud/bigquery 
' 
 ); 
 const 
  
 bigquery 
  
 = 
  
 new 
  
  BigQuery 
 
 (); 
 async 
  
 function 
  
 queryDryRun 
 () 
  
 { 
  
 // Runs a dry query of the U.S. given names dataset for the state of Texas. 
  
 const 
  
 query 
  
 = 
  
 `SELECT name 
 FROM \`bigquery-public-data.usa_names.usa_1910_2013\` 
 WHERE state = 'TX' 
 LIMIT 100` 
 ; 
  
 // For all options, see https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query 
  
 const 
  
 options 
  
 = 
  
 { 
  
 query 
 : 
  
 query 
 , 
  
 // Location must match that of the dataset(s) referenced in the query. 
  
 location 
 : 
  
 'US' 
 , 
  
 dryRun 
 : 
  
 true 
 , 
  
 }; 
  
 // Run the query as a job 
  
 const 
  
 [ 
 job 
 ] 
  
 = 
  
 await 
  
 bigquery 
 . 
 createQueryJob 
 ( 
 options 
 ); 
  
 // Print the status and statistics 
  
 console 
 . 
 log 
 ( 
 'Status:' 
 ); 
  
 console 
 . 
 log 
 ( 
  job 
 
 . 
 metadata 
 . 
 status 
 ); 
  
 console 
 . 
 log 
 ( 
 '\nJob Statistics:' 
 ); 
  
 console 
 . 
 log 
 ( 
  job 
 
 . 
 metadata 
 . 
 statistics 
 ); 
 }

Estimate and control costs

Understand compute pricing for BigQuery

Pricing models

Restrict costs for each model

Control query costs

Create custom query quotas

Check the estimated cost before running a query

Use the query validator

Perform a dry run

Console

bq

API

Go

Java

Node.js

PHP

Python

Estimate query costs

On-demand query size calculation

Avoid running queries to explore table data

Restrict the number of bytes billed per query

Console

bq

API

Avoid using LIMIT in non-clustered tables

Materialize query results in stages

Control workload costs

Use the Google Cloud pricing calculator

On-demand

Editions

Use reservations and commitments

Use the slot estimator

Cancel unnecessary long-running jobs

View costs using a dashboard

Use billing budgets and alerts

Control storage costs

Use long-term storage

Configure the storage billing model

Avoid overwriting tables

Reduce the time travel window

Use table expiration for destination tables

Archive data to Cloud Storage

Troubleshooting BigQuery cost discrepancies and unexpected charges

Unexpected charges related to queries, reservations and commitments

Slot-hours billed larger than INFORMATION_SCHEMA.JOBS view calculated slot-hours

Billing is less than the total bytes billed calculated through INFORMATION_SCHEMA.JOBS for project running on-demand queries

Billing is larger than the bytes processed calculated through INFORMATION_SCHEMA.JOBS for project running on-demand queries

Billed for BigQuery Reservations API usage even though the API is disabled and not reservations or commitments used

Project is assigned to a reservation, but still seeing BigQuery Analysis on-demand costs

Unexpected charges for pay-as-you go (PAYG) slots for the BigQuery Standard Edition

BigQuery Reservations API charges appearing after the Reservation API is disabled

Unexpected storage charges

Deletion of table(s) or dataset(s) resulted in higher BigQuery storage costs

Storage costs reduced with no modifications to the data

INFORMATION_SCHEMA storage calculations don't match billing

What's next

Avoid using `LIMIT` in non-clustered tables