Batch loading data

You can load data into BigQuery from Cloud Storage or from a local file as a batch operation. The source data can be in any of the following formats:

Avro
Comma-separated values (CSV)
JSON (newline-delimited)
ORC
Parquet
Datastore exports stored in Cloud Storage
Firestore exports stored in Cloud Storage

You can also use BigQuery Data Transfer Service to set up recurring loads from Cloud Storage into BigQuery.

Try it for yourself

If you're new to Google Cloud, create an account to evaluate how BigQuery performs in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Try BigQuery free

Before you begin

Grant Identity and Access Management (IAM) roles that give users the necessary permissions to perform each task in this document, and create a dataset to store your data.

Required permissions

To load data into BigQuery, you need IAM permissions to run a load job and load data into BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need IAM permissions to access the bucket that contains your data.

Permissions to load data into BigQuery

To load data into a new BigQuery table or partition or to append or overwrite an existing table or partition, you need the following IAM permissions:

bigquery.tables.create
bigquery.tables.updateData
bigquery.tables.update
bigquery.jobs.create

Each of the following predefined IAM roles includes the permissions that you need in order to load data into a BigQuery table or partition:

roles/bigquery.dataEditor
roles/bigquery.dataOwner
roles/bigquery.admin (includes the bigquery.jobs.create permission)
bigquery.user (includes the bigquery.jobs.create permission)
bigquery.jobUser (includes the bigquery.jobs.create permission)

Additionally, if you have the bigquery.datasets.create permission, you can create and update tables using a load job in the datasets that you create.

For more information on IAM roles and permissions in BigQuery, see Predefined roles and permissions .

Permissions to load data from Cloud Storage

To get the permissions that you need to load data from a Cloud Storage bucket, ask your administrator to grant you the Storage Admin ( roles/storage.admin ) IAM role on the bucket. For more information about granting roles, see Manage access to projects, folders, and organizations .

This predefined role contains the permissions required to load data from a Cloud Storage bucket. To see the exact permissions that are required, expand the Required permissionssection:

Required permissions

The following permissions are required to load data from a Cloud Storage bucket:

storage.buckets.get
storage.objects.get
storage.objects.list (required if you are using a URI wildcard )

You might also be able to get these permissions with custom roles or other predefined roles .

Create a dataset

Create a BigQuery dataset to store your data.

Loading data from Cloud Storage

BigQuery supports loading data from any of the following Cloud Storage storage classes :

Standard
Nearline
Coldline
Archive

To learn how to load data into BigQuery, see the page for your data format:

To learn how to configure a recurring load from Cloud Storage into BigQuery, see Cloud Storage transfers .

Location considerations

You cannot change the location of a dataset after it is created, but you can make a copy of the dataset or manually move it. For more information, see:

Retrieving the Cloud Storage URI

To load data from a Cloud Storage data source, you must provide the Cloud Storage URI.

The Cloud Storage resource path contains your bucket name and your object (filename). For example, if the Cloud Storage bucket is named mybucket and the data file is named myfile.csv , the resource path would be gs://mybucket/myfile.csv .

BigQuery does not support Cloud Storage resource paths that include multiple consecutive slashes after the initial double slash. Cloud Storage object names can contain multiple consecutive slash ("/") characters. However, BigQuery converts multiple consecutive slashes into a single slash. For example, the following resource path, though valid in Cloud Storage, does not work in BigQuery: gs:// bucket /my//object//name .

To retrieve the Cloud Storage resource path:

Open the Cloud Storage console.

Cloud Storage console
Browse to the location of the object (file) that contains the source data.
Click on the name of the object.

The Object detailspage opens.
Copy the value provided in the gsutil URIfield, which begins with gs:// .

For Google Datastore exports, only one URI can be specified, and it must end with .backup_info or .export_metadata .

Wildcard support for Cloud Storage URIs

If your data is separated into multiple files, you can use an asterisk (*) wildcard to select multiple files. Use of the asterisk wildcard must follow these rules:

The asterisk can appear inside the object name or at the end of the object name.
Using multiple asterisks is unsupported. For example, the path gs://mybucket/fed-*/temp/*.csv is invalid.
Using an asterisk with the bucket name is unsupported.

Examples:

The following example shows how to select all of the files in all the folders which start with the prefix gs://mybucket/fed-samples/fed-sample :
```
 gs://mybucket/fed-samples/fed-sample* 
```
The following example shows how to select only files with a .csv extension in the folder named fed-samples and any subfolders of fed-samples :
```
 gs://mybucket/fed-samples/*.csv 
```
The following example shows how to select files with a naming pattern of fed-sample*.csv in the folder named fed-samples . This example doesn't select files in subfolders of fed-samples .
```
 gs://mybucket/fed-samples/fed-sample*.csv 
```

When using the bq command-line tool, you might need to escape the asterisk on some platforms.

You can't use an asterisk wildcard when you load Datastore or Firestore export data from Cloud Storage.

Limitations

You are subject to the following limitations when you load data into BigQuery from a Cloud Storage bucket:

BigQuery does not guarantee data consistency for external data sources. Changes to the underlying data while a query is running can result in unexpected behavior.
BigQuery doesn't support Cloud Storage object versioning . If you include a generation number in the Cloud Storage URI, then the load job fails.

Depending on the format of your Cloud Storage source data, there may be additional limitations. For more information, see:

Loading data from local files

You can load data from a readable data source (such as your local machine) by using one of the following:

The Google Cloud console
The bq command-line tool's bq load command
The API
The client libraries

When you load data using the Google Cloud console or the bq command-line tool, a load job is automatically created.

To load data from a local data source:

Console

Open the BigQuery page in the Google Cloud console.

Go to the BigQuery page
In the Explorerpanel, expand your project and select a dataset.
Expand the Actionsoption and click Open.
In the details panel, click Create table .
On the Create tablepage, in the Sourcesection:
- For Create table from, select Upload.
- For Select file, click Browse.
- Browse to the file, and click Open. Note that wildcards and comma-separated lists are not supported for local files.
- For File format, select CSV, JSON (newline delimited), Avro, Parquet, or ORC.
On the Create tablepage, in the Destinationsection:
- For Project, choose the appropriate project.
- For Dataset, choose the appropriate dataset.
- In the Tablefield, enter the name of the table you're creating in BigQuery.
- Verify that Table typeis set to Native table.
In the Schemasection, enter the schema definition.
- For CSV and JSON files, you can check the Auto-detectoption to enable schema auto-detect . Schema information is self-described in the source data for other supported file types.
- You can also enter schema information manually by:
  - Clicking Edit as textand entering the table schema as a JSON array:
    
    Note: You can view the schema of an existing table in JSON format by entering the following command: bq show --format=prettyjson dataset.table .
  - Using Add Fieldto manually input the schema.
Select applicable items in the Advanced optionssection. For information on the available options, see CSV options and JSON options .
Optional: In the Advanced optionschoose the write disposition:
- Write if empty: Write the data only if the table is empty.
- Append to table: Append the data to the end of the table. This setting is the default.
- Overwrite table: Erase all existing data in the table before writing the new data.
Click Create Table.

bq

Use the bq load command, specify the source_format , and include the path to the local file.

(Optional) Supply the --location flag and set the value to your location .

If you are loading data in a project other than your default project, add the project ID to the dataset in the following format: PROJECT_ID:DATASET .

bq  
--location = 
 LOCATION 
  
load  
 \ 
--source_format = 
 FORMAT 
  
 \ 
 PROJECT_ID:DATASET.TABLE 
  
 \ 
 PATH_TO_SOURCE 
  
 \ 
 SCHEMA

Replace the following:

LOCATION : your location. The --location flag is optional. For example, if you are using BigQuery in the Tokyo region, set the flag's value to asia-northeast1 . You can set a default value for the location by using the .bigqueryrc file .
FORMAT : CSV , AVRO , PARQUET , ORC , or NEWLINE_DELIMITED_JSON .
project_id : your project ID.
dataset : an existing dataset.
table : the name of the table into which you're loading data.
path_to_source : the path to the local file.
schema : a valid schema. The schema can be a local JSON file, or it can be typed inline as part of the command. You can also use the --autodetect flag instead of supplying a schema definition.

In addition, you can add flags for options that let you control how BigQuery parses your data. For example, you can use the --skip_leading_rows flag to ignore header rows in a CSV file. For more information, see CSV options and JSON options .

Examples:

The following command loads a local newline-delimited JSON file ( mydata.json ) into a table named mytable in mydataset in your default project. The schema is defined in a local schema file named myschema.json .

   
 bq 
  
 load 
  
\  
 -- 
 source_format 
 = 
 NEWLINE_DELIMITED_JSON 
  
\  
 mydataset 
 . 
 mytable 
  
\  
 ./ 
 mydata 
 . 
 json 
  
\  
 ./ 
 myschema 
 . 
 json

The following command loads a local CSV file ( mydata.csv ) into a table named mytable in mydataset in myotherproject . The schema is defined inline in the format FIELD:DATA_TYPE , FIELD:DATA_TYPE .

   
 bq 
  
 load 
  
\  
 -- 
 source_format 
 = 
 CSV 
  
\  
 myotherproject 
 : 
 mydataset 
 . 
 mytable 
  
\  
 ./ 
 mydata 
 . 
 csv 
  
\  
 qtr 
 : 
 STRING 
 , 
 sales 
 : 
 FLOAT 
 , 
 year 
 : 
 STRING

The following command loads a local CSV file ( mydata.csv ) into a table named mytable in mydataset in your default project. The schema is defined using schema auto-detection .

   
 bq 
  
 load 
  
\  
 -- 
 autodetect 
  
\  
 -- 
 source_format 
 = 
 CSV 
  
\  
 mydataset 
 . 
 mytable 
  
\  
 ./ 
 mydata 
 . 
 csv

C#

Before trying this sample, follow the C# setup instructions in the BigQuery quickstart using client libraries . For more information, see the BigQuery C# API reference documentation .

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .

The following code demonstrates how to load a local CSV file to a new BigQuery table. To load a local file of another format, use the update options class for the appropriate format from the JobCreationOptions base class instead of UploadCsvOptions .

  using 
  
  Google.Cloud.BigQuery.V2 
 
 ; 
 using 
  
 System 
 ; 
 using 
  
 System.IO 
 ; 
 public 
  
 class 
  
 BigQueryLoadFromFile 
 { 
  
 public 
  
 void 
  
 LoadFromFile 
 ( 
  
 string 
  
 projectId 
  
 = 
  
 "your-project-id" 
 , 
  
 string 
  
 datasetId 
  
 = 
  
 "your_dataset_id" 
 , 
  
 string 
  
 tableId 
  
 = 
  
 "your_table_id" 
 , 
  
 string 
  
 filePath 
  
 = 
  
 "path/to/file.csv" 
  
 ) 
  
 { 
  
  BigQueryClient 
 
  
 client 
  
 = 
  
  BigQueryClient 
 
 . 
  Create 
 
 ( 
 projectId 
 ); 
  
 // Create job configuration 
  
 var 
  
 uploadCsvOptions 
  
 = 
  
 new 
  
  UploadCsvOptions 
 
 () 
  
 { 
  
 SkipLeadingRows 
  
 = 
  
 1 
 , 
  
 // Skips the file headers 
  
 Autodetect 
  
 = 
  
 true 
  
 }; 
  
 using 
  
 ( 
 FileStream 
  
 stream 
  
 = 
  
 File 
 . 
 Open 
 ( 
 filePath 
 , 
  
 FileMode 
 . 
 Open 
 )) 
  
 { 
  
 // Create and run job 
  
 // Note that there are methods available for formats other than CSV 
  
  BigQueryJob 
 
  
 job 
  
 = 
  
 client 
 . 
  UploadCsv 
 
 ( 
  
 datasetId 
 , 
  
 tableId 
 , 
  
 null 
 , 
  
 stream 
 , 
  
 uploadCsvOptions 
 ); 
  
 job 
  
 = 
  
 job 
 . 
  PollUntilCompleted 
 
 (). 
 ThrowOnAnyError 
 (); 
  
 // Waits for the job to complete. 
  
 // Display the number of rows uploaded 
  
  BigQueryTable 
 
  
 table 
  
 = 
  
 client 
 . 
  GetTable 
 
 ( 
 datasetId 
 , 
  
 tableId 
 ); 
  
 Console 
 . 
 WriteLine 
 ( 
  
 $"Loaded {table. Resource 
.NumRows} rows to {table. FullyQualifiedId 
}" 
 ); 
  
 } 
  
 } 
 }

Console option	bq tool flag	BigQuery API property	Description
Write if empty	None	WRITE_EMPTY	Writes the data only if the table is empty.
Append to table	`--noreplace` or `--replace=false` ; if `--replace` is unspecified, the default is append	WRITE_APPEND	(Default) Appends the data to the end of the table.
Overwrite table	`--replace` or `--replace=true`	WRITE_TRUNCATE	Erases all existing data in a table before writing the new data.

Batch loading data

Try it for yourself

Before you begin

Required permissions

Permissions to load data into BigQuery

Permissions to load data from Cloud Storage

Required permissions

Create a dataset

Loading data from Cloud Storage

Location considerations

Retrieving the Cloud Storage URI

Wildcard support for Cloud Storage URIs

Limitations

Loading data from local files

Console

bq

C#

Go

Java

Node.js

PHP

Python

Ruby

Limitations

Load job capacity

Loading compressed and uncompressed data

Appending to or overwriting a table

Quota policy

View current quota usage

Pricing

Example use case