Stay organized with collectionsSave and categorize content based on your preferences.
Loading CSV data from Cloud Storage
When you load CSV data from Cloud Storage, you can load the data into a new
table or partition, or you can append to or overwrite an existing table or
partition. When your data is loaded into BigQuery, it is
converted into columnar format forCapacitor(BigQuery's storage format).
When you load data from Cloud Storage into a BigQuery table,
the dataset that contains the table must be in the same regional or multi-
regional location as the Cloud Storage bucket.
If you're new to Google Cloud, create an account to evaluate how
BigQuery performs in real-world
scenarios. New customers also get $300 in free credits to run, test, and
deploy workloads.
You are subject to the following limitations when you load data into
BigQuery from a Cloud Storage bucket:
BigQuery does not guarantee data consistency for external data
sources. Changes to the underlying data while a query is running can result in
unexpected behavior.
BigQuery doesn't supportCloud Storage object versioning. If you
include a generation number in the Cloud Storage URI, then the load job
fails.
When you load CSV files into BigQuery, note the following:
CSV files don't support nested or repeated data.
Remove byte order mark (BOM) characters. They might cause unexpected issues.
If you use gzip compression,
BigQuery cannot read the data in parallel. Loading compressed
CSV data into BigQuery is slower than loading uncompressed data.
SeeLoading compressed and uncompressed data.
You cannot include both compressed and uncompressed files in the same load
job.
The maximum size for a gzip file is 4 GB.
Loading CSV datausing schema autodetectiondoes not automatically detect headers if all of the columns are string types.
In this case, add a numerical column to the input or declare the schema
explicitly.
When you load CSV or JSON data, values inDATEcolumns must use the dash
(-) separator and the date must be in the following format:YYYY-MM-DD(year-month-day).
When you load JSON or CSV data, values inTIMESTAMPcolumns
must use a dash (-) or slash (/) separator for the date portion of the
timestamp, and the date must be in one of the following formats:YYYY-MM-DD(year-month-day) orYYYY/MM/DD(year/month/day).
Thehh:mm:ss(hour-minute-second) portion of the timestamp must use a colon
(:) separator.
Your files must meet the CSV file size limits described in theload jobs limits.
Before you begin
Grant Identity and Access Management (IAM) roles that give users the necessary
permissions to perform each task in this document, and create a dataset
to store your data.
Required permissions
To load data into BigQuery, you need IAM permissions to run a load job and load data into BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need IAM permissions to access the bucket that contains your data.
Permissions to load data into BigQuery
To load data into a new BigQuery table or partition or to append or overwrite an existing table or partition, you need the following IAM permissions:
bigquery.tables.create
bigquery.tables.updateData
bigquery.tables.update
bigquery.jobs.create
Each of the following predefined IAM roles includes the permissions that you need in order to load data into a BigQuery table or partition:
To get the permissions that
you need to load data from a Cloud Storage bucket,
ask your administrator to grant you theStorage Admin(roles/storage.admin) IAM role on the bucket.
For more information about granting roles, seeManage access to projects, folders, and organizations.
This predefined role contains
the permissions required to load data from a Cloud Storage bucket. To see the exact permissions that are
required, expand theRequired permissionssection:
Required permissions
The following permissions are required to load data from a Cloud Storage bucket:
storage.buckets.get
storage.objects.get
storage.objects.list (required if you are using a URIwildcard)
You can use thegziputility to compress CSV files. Note thatgzipperforms
full file compression, unlike the file content compression performed by
compression codecs for other file formats, such as Avro. Usinggzipto
compress your CSV files might have a performance impact; for more information
about the trade-offs, seeLoading compressed and uncompressed data.
Loading CSV data into a table
To load CSV data from Cloud Storage into a new BigQuery
table, select one of the following options:
import("context""fmt""cloud.google.com/go/bigquery")// importCSVExplicitSchema demonstrates loading CSV data from Cloud Storage into a BigQuery// table and providing an explicit schema for the data.funcimportCSVExplicitSchema(projectID,datasetID,tableIDstring)error{// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %v",err)}deferclient.Close()gcsRef:=bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states.csv")gcsRef.SkipLeadingRows=1gcsRef.Schema=bigquery.Schema{{Name:"name",Type:bigquery.StringFieldType},{Name:"post_abbr",Type:bigquery.StringFieldType},}loader:=client.Dataset(datasetID).Table(tableID).LoaderFrom(gcsRef)loader.WriteDisposition=bigquery.WriteEmptyjob,err:=loader.Run(ctx)iferr!=nil{returnerr}status,err:=job.Wait(ctx)iferr!=nil{returnerr}ifstatus.Err()!=nil{returnfmt.Errorf("job completed with error: %v",status.Err())}returnnil}
importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.CsvOptions;importcom.google.cloud.bigquery.Field;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.LoadJobConfiguration;importcom.google.cloud.bigquery.Schema;importcom.google.cloud.bigquery.StandardSQLTypeName;importcom.google.cloud.bigquery.TableId;// Sample to load CSV data from Cloud Storage into a new BigQuery tablepublicclassLoadCsvFromGcs{publicstaticvoidrunLoadCsvFromGcs()throwsException{// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";StringsourceUri="gs://cloud-samples-data/bigquery/us-states/us-states.csv";Schemaschema=Schema.of(Field.of("name",StandardSQLTypeName.STRING),Field.of("post_abbr",StandardSQLTypeName.STRING));loadCsvFromGcs(datasetName,tableName,sourceUri,schema);}publicstaticvoidloadCsvFromGcs(StringdatasetName,StringtableName,StringsourceUri,Schemaschema){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();// Skip header row in the file.CsvOptionscsvOptions=CsvOptions.newBuilder().setSkipLeadingRows(1).build();TableIdtableId=TableId.of(datasetName,tableName);LoadJobConfigurationloadConfig=LoadJobConfiguration.newBuilder(tableId,sourceUri,csvOptions).setSchema(schema).build();// Load data from a GCS CSV file into the tableJobjob=bigquery.create(JobInfo.of(loadConfig));// Blocks until this load table job completes its execution, either failing or succeeding.job=job.waitFor();if(job.isDone()){System.out.println("CSV from GCS successfully added during load append job");}else{System.out.println("BigQuery was unable to load into the table due to an error:"+job.getStatus().getError());}}catch(BigQueryException|InterruptedExceptione){System.out.println("Column not added during load append \n"+e.toString());}}}
// Import the Google Cloud client librariesconst{BigQuery}=require('@google-cloud/bigquery');const{Storage}=require('@google-cloud/storage');// Instantiate clientsconstbigquery=newBigQuery();conststorage=newStorage();/*** This sample loads the CSV file at* https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.csv** TODO(developer): Replace the following lines with the path to your file.*/constbucketName='cloud-samples-data';constfilename='bigquery/us-states/us-states.csv';asyncfunctionloadCSVFromGCS(){// Imports a GCS file into a table with manually defined schema./*** TODO(developer): Uncomment the following lines before running the sample.*/// const datasetId = 'my_dataset';// const tableId = 'my_table';// Configure the load job. For full list of options, see:// https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoadconstmetadata={sourceFormat:'CSV',skipLeadingRows:1,schema:{fields:[{name:'name',type:'STRING'},{name:'post_abbr',type:'STRING'},],},location:'US',};// Load data from a Google Cloud Storage file into the tableconst[job]=awaitbigquery.dataset(datasetId).table(tableId).load(storage.bucket(bucketName).file(filename),metadata);// load() waits for the job to finishconsole.log(`Job${job.id}completed.`);// Check the job's status for errorsconsterrors=job.status.errors;if(errors&&errors.length>0){throwerrors;}}
use Google\Cloud\BigQuery\BigQueryClient;use Google\Cloud\Core\ExponentialBackoff;/** Uncomment and populate these variables in your code */// $projectId = 'The Google project ID';// $datasetId = 'The BigQuery dataset ID';// instantiate the bigquery table service$bigQuery = new BigQueryClient(['projectId' => $projectId,]);$dataset = $bigQuery->dataset($datasetId);$table = $dataset->table('us_states');// create the import job$gcsUri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv';$schema = ['fields' => [['name' => 'name', 'type' => 'string'],['name' => 'post_abbr', 'type' => 'string']]];$loadConfig = $table->loadFromStorage($gcsUri)->schema($schema)->skipLeadingRows(1);$job = $table->runJob($loadConfig);// poll the job until it is complete$backoff = new ExponentialBackoff(10);$backoff->execute(function () use ($job) {print('Waiting for job to complete' . PHP_EOL);$job->reload();if (!$job->isComplete()) {throw new Exception('Job has not yet completed', 500);}});// check if the job has errorsif (isset($job->info()['status']['errorResult'])) {$error = $job->info()['status']['errorResult']['message'];printf('Error running job: %s' . PHP_EOL, $error);} else {print('Data imported successfully' . PHP_EOL);}
fromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_name"job_config=bigquery.LoadJobConfig(schema=[bigquery.SchemaField("name","STRING"),bigquery.SchemaField("post_abbr","STRING"),],skip_leading_rows=1,# The source format defaults to CSV, so the line below is optional.source_format=bigquery.SourceFormat.CSV,)uri="gs://cloud-samples-data/bigquery/us-states/us-states.csv"load_job=client.load_table_from_uri(uri,table_id,job_config=job_config)# Make an API request.load_job.result()# Waits for the job to complete.destination_table=client.get_table(table_id)# Make an API request.print("Loaded{}rows.".format(destination_table.num_rows))
require"google/cloud/bigquery"defload_table_gcs_csvdataset_id="your_dataset_id"bigquery=Google::Cloud::Bigquery.newdataset=bigquery.datasetdataset_idgcs_uri="gs://cloud-samples-data/bigquery/us-states/us-states.csv"table_id="us_states"load_job=dataset.load_jobtable_id,gcs_uri,skip_leading:1do|schema|schema.string"name"schema.string"post_abbr"endputs"Starting job#{load_job.job_id}"load_job.wait_until_done!# Waits for table load to complete.puts"Job finished."table=dataset.tabletable_idputs"Loaded#{table.rows_count}rows to table#{table.id}"end
Loading CSV data into a table that uses column-based time partitioning
To load CSV data from Cloud Storage into a BigQuery table
that uses column-based time partitioning:
import("context""fmt""time""cloud.google.com/go/bigquery")// importPartitionedTable demonstrates specifing time partitioning for a BigQuery table when loading// CSV data from Cloud Storage.funcimportPartitionedTable(projectID,destDatasetID,destTableIDstring)error{// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %v",err)}deferclient.Close()gcsRef:=bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states-by-date.csv")gcsRef.SkipLeadingRows=1gcsRef.Schema=bigquery.Schema{{Name:"name",Type:bigquery.StringFieldType},{Name:"post_abbr",Type:bigquery.StringFieldType},{Name:"date",Type:bigquery.DateFieldType},}loader:=client.Dataset(destDatasetID).Table(destTableID).LoaderFrom(gcsRef)loader.TimePartitioning=&bigquery.TimePartitioning{Field:"date",Expiration:90*24*time.Hour,}loader.WriteDisposition=bigquery.WriteEmptyjob,err:=loader.Run(ctx)iferr!=nil{returnerr}status,err:=job.Wait(ctx)iferr!=nil{returnerr}ifstatus.Err()!=nil{returnfmt.Errorf("job completed with error: %v",status.Err())}returnnil}
importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.Field;importcom.google.cloud.bigquery.FormatOptions;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobId;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.LoadJobConfiguration;importcom.google.cloud.bigquery.Schema;importcom.google.cloud.bigquery.StandardSQLTypeName;importcom.google.cloud.bigquery.TableId;importcom.google.cloud.bigquery.TimePartitioning;importjava.time.Duration;importjava.time.temporal.ChronoUnit;importjava.util.UUID;publicclassLoadPartitionedTable{publicstaticvoidrunLoadPartitionedTable()throwsException{// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";StringsourceUri="/path/to/file.csv";loadPartitionedTable(datasetName,tableName,sourceUri);}publicstaticvoidloadPartitionedTable(StringdatasetName,StringtableName,StringsourceUri)throwsException{try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();TableIdtableId=TableId.of(datasetName,tableName);Schemaschema=Schema.of(Field.of("name",StandardSQLTypeName.STRING),Field.of("post_abbr",StandardSQLTypeName.STRING),Field.of("date",StandardSQLTypeName.DATE));// Configure time partitioning. For full list of options, see:// https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TimePartitioningTimePartitioningpartitioning=TimePartitioning.newBuilder(TimePartitioning.Type.DAY).setField("date").setExpirationMs(Duration.of(90,ChronoUnit.DAYS).toMillis()).build();LoadJobConfigurationloadJobConfig=LoadJobConfiguration.builder(tableId,sourceUri).setFormatOptions(FormatOptions.csv()).setSchema(schema).setTimePartitioning(partitioning).build();// Create a job ID so that we can safely retry.JobIdjobId=JobId.of(UUID.randomUUID().toString());JobloadJob=bigquery.create(JobInfo.newBuilder(loadJobConfig).setJobId(jobId).build());// Load data from a GCS parquet file into the table// Blocks until this load table job completes its execution, either failing or succeeding.JobcompletedJob=loadJob.waitFor();// Check for errorsif(completedJob==null){thrownewException("Job not executed since it no longer exists.");}elseif(completedJob.getStatus().getError()!=null){// You can also look at queryJob.getStatus().getExecutionErrors() for all// errors, not just the latest one.thrownewException("BigQuery was unable to load into the table due to an error: \n"+loadJob.getStatus().getError());}System.out.println("Data successfully loaded into time partitioned table during load job");}catch(BigQueryException|InterruptedExceptione){System.out.println("Data not loaded into time partitioned table during load job \n"+e.toString());}}}
// Import the Google Cloud client librariesconst{BigQuery}=require('@google-cloud/bigquery');const{Storage}=require('@google-cloud/storage');// Instantiate clientsconstbigquery=newBigQuery();conststorage=newStorage();/*** This sample loads the CSV file at* https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.csv** TODO(developer): Replace the following lines with the path to your file.*/constbucketName='cloud-samples-data';constfilename='bigquery/us-states/us-states-by-date.csv';asyncfunctionloadTablePartitioned(){// Load data into a table that uses column-based time partitioning./*** TODO(developer): Uncomment the following lines before running the sample.*/// const datasetId = 'my_dataset';// const tableId = 'my_new_table';// Configure the load job. For full list of options, see:// https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoadconstpartitionConfig={type:'DAY',expirationMs:'7776000000',// 90 daysfield:'date',};constmetadata={sourceFormat:'CSV',skipLeadingRows:1,schema:{fields:[{name:'name',type:'STRING'},{name:'post_abbr',type:'STRING'},{name:'date',type:'DATE'},],},location:'US',timePartitioning:partitionConfig,};// Load data from a Google Cloud Storage file into the tableconst[job]=awaitbigquery.dataset(datasetId).table(tableId).load(storage.bucket(bucketName).file(filename),metadata);// load() waits for the job to finishconsole.log(`Job${job.id}completed.`);}
fromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_name"job_config=bigquery.LoadJobConfig(schema=[bigquery.SchemaField("name","STRING"),bigquery.SchemaField("post_abbr","STRING"),bigquery.SchemaField("date","DATE"),],skip_leading_rows=1,time_partitioning=bigquery.TimePartitioning(type_=bigquery.TimePartitioningType.DAY,field="date",# Name of the column to use for partitioning.expiration_ms=7776000000,# 90 days.),)uri="gs://cloud-samples-data/bigquery/us-states/us-states-by-date.csv"load_job=client.load_table_from_uri(uri,table_id,job_config=job_config)# Make an API request.load_job.result()# Wait for the job to complete.table=client.get_table(table_id)print("Loaded{}rows to table{}".format(table.num_rows,table_id))
Appending to or overwriting a table with CSV data
You can load additional data into a table either from source files or by
appending query results.
In the Google Cloud console, use theWrite preferenceoption to specify
what action to take when you load data from a source file or from a query
result.
You have the following options when you load additional data into a table:
Console option
bq tool flag
BigQuery API property
Description
Write if empty
Not supported
WRITE_EMPTY
Writes the data only if the table is empty.
Append to table
--noreplaceor--replace=false; if--[no]replaceis unspecified, the default is append
WRITE_APPEND
(Default)
Appends the data to the end of the table.
Overwrite table
--replaceor--replace=true
WRITE_TRUNCATE
Erases all existing data in a table before writing the new data.
This action also deletes the table schema, row level security, and removes any
Cloud KMS key.
If you load data into an existing table, the load job can append the data or
overwrite the table.
BigQuery supports loading hive-partitioned CSV data stored on
Cloud Storage and will populate the hive partitioning columns as columns in
the destination BigQuery managed table. For more information, seeLoading Externally Partitioned Data from Cloud Storage.
Details of loading CSV data
This section describes how BigQuery handles various CSV formatting
options.
Encoding
BigQuery expects CSV data to be UTF-8 encoded. If you have
CSV files with other supported encoding types, you should explicitly specify the
encoding so that BigQuery can properly convert the data to UTF-8.
BigQuery supports the following encoding types for CSV files:
UTF-8
ISO-8859-1
UTF-16BE (UTF-16 Big Endian)
UTF-16LE (UTF-16 Little Endian)
UTF-32BE (UTF-32 Big Endian)
UTF-32LE (UTF-32 Little Endian)
If you don't specify an encoding, or if you specify UTF-8 encoding when the CSV
file is not UTF-8 encoded, BigQuery attempts to convert the data
to UTF-8. Generally, if the CSV file is ISO-8859-1 encoded, your data will be
loaded successfully, but it may not exactly match what you expect. If the CSV
file is UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE encoded, the load might fail.
To avoid unexpected failures, specify the correct encoding by using the--encodingflag.
If BigQuery can't convert a character other than the ASCII0character, BigQuery converts the character to the standard
Unicode replacement character: �.
Field delimiters
Delimiters in CSV files can be any single-byte character. If the source file
uses ISO-8859-1 encoding, any character can be a delimiter. If the source file
uses UTF-8 encoding, any character in the decimal range 1-127 (U+0001-U+007F)
can be used without modification. You can insert an ISO-8859-1 character outside
of this range as a delimiter, and BigQuery will interpret it
correctly. However, if you use a multibyte character as a delimiter, some of the
bytes will be interpreted incorrectly as part of the field value.
Generally, it's a best practice to use a standard delimiter, such as a tab,
pipe, or comma. The default is a comma.
Data types
Boolean. BigQuery can parse any of the following pairs for
Boolean data: 1 or 0, true or false, t or f, yes or no, or y or n (all case
insensitive). Schemaautodetectionautomatically detects any of these except 0 and 1.
Bytes. Columns with BYTES types must be encoded as Base64.
Date. Columns with DATE types must be in the formatYYYY-MM-DD.
Datetime. Columns with DATETIME types must be in the formatYYYY-MM-DD
HH:MM:SS[.SSSSSS].
Geography. Columns with GEOGRAPHY types must contain strings in one of the
following formats:
Interval. Columns withINTERVALtypes must be in the formatY-M D H:M:S[.F], where:
Y = Year. Supported range is 0-10,000.
M = Month. Supported range is 1-12.
D = Day. Supported range is 1-[last day of the indicated month].
H = Hour.
M = Minute.
S = Second.
[.F] = Fractions of a second up to six digits, with microsecond precision.
You can indicate a negative value by prepending a dash (-).
The following list shows examples of valid data:
10-6 0 0:0:0
0-0 -5 0:0:0
0-0 0 0:0:1.25
To load INTERVAL data, you must use thebq loadcommand and use
the--schemaflag to specify a schema. You can't upload INTERVAL data by using
the console.
JSON. Quotes are escaped by using the two character sequence"". For more
information, see an example ofloading JSON data from a CSV file
Time. Columns with TIME types must be in the formatHH:MM:SS[.SSSSSS].
Timestamp. BigQuery accepts various timestamp formats.
The timestamp must include a date portion and a time portion.
The date portion can be formatted asYYYY-MM-DDorYYYY/MM/DD.
The timestamp portion must be formatted asHH:MM[:SS[.SSSSSS]](seconds and
fractions of seconds are optional).
The date and time must be separated by a space or 'T'.
Optionally, the date and time can be followed by a UTC offset or the UTC zone
designator (Z). For more information, seeTime zones.
For example, any of the following are valid timestamp values:
2018-08-19T12:11
2018-08-19T12:11:35
2018-08-19T12:11:35.22
2018/08/19T12:11
2018-07-05T12:54:00 UTC
2018-08-19T07:11:35.220 -05:00
2018-08-19T12:11:35.220Z
If you provide a schema, BigQuery also accepts Unix epoch time for
timestamp values. However, schema autodetection doesn't detect this case, and
treats the value as a numeric or string type instead.
Examples of Unix epoch timestamp values:
1534680695
1.534680695e12
RANGE. Represented in CSV files in the format[LOWER_BOUND,UPPER_BOUND),
whereLOWER_BOUNDandUPPER_BOUNDare validDATE,DATETIME, orTIMESTAMPstrings.NULLandUNBOUNDEDrepresent unbounded start or end values.
The following are examples of CSV values forRANGE<DATE>:
BigQuery infers headers by comparing the first row of the file
with other rows in the file. If the first line contains only strings, and the
other lines contain other data types, BigQuery assumes that the
first row is a header row. BigQuery assigns column names based on the field names in the header row. The names might be modified to meet thenaming rulesfor columns in BigQuery. For example, spaces will be replaced with underscores.
Otherwise, BigQuery assumes the first row is a data row, and
assigns generic column names such asstring_field_1. Note that after a table
is created, the column names cannot be updated in the schema, although you canchange the names
manuallyafter the table is created. Another option is to provide an explicit schema
instead of using autodetect.
You might have a CSV file with a header row, where all of the data fields are
strings. In that case, BigQuery will not automatically detect that
the first row is a header. Use the--skip_leading_rowsoption to skip the
header row. Otherwise, the header will be imported as data. Also consider
providing an explicit schema in this case, so that you can assign column names.
CSV quoted new lines
BigQuery detects quoted new line characters within a CSV field
and does not interpret the quoted new line character as a row boundary.
Troubleshoot parsing errors
If there's a problem parsing your CSV files, then the
load job'serrorsresource is
populated with the error details.
Generally, these errors identify the start of the problematic line with a byte
offset. For uncompressed files you can usegcloud storagewith the--recursiveargument to access the relevant line.
For example, you run thebq loadcommandand receive an error:
The error in the output is similar to the following:
Waiting on bqjob_r5268069f5f49c9bf_0000018632e903d7_1 ... (0s)
Current status: DONE
BigQuery error in load operation: Error processing job
'myproject:bqjob_r5268069f5f49c9bf_0000018632e903d7_1': Error while reading
data, error message: Error detected while parsing row starting at position: 1405.
Error: Data between close quote character (") and field separator.
File: gs://my-bucket/mytable.csv
Failure details:-gs://my-bucket/mytable.csv: Error while reading data,
error message: Error detected while parsing row starting at
position:1405. Error: Data between close quote character (") and
field separator. File: gs://my-bucket/mytable.csv- Error while reading data, error message: CSV processing encountered
too many errors, giving up. Rows: 22; errors: 1; max bad: 0; error
percent: 0
Based on the preceding error, there's a format error in the file.
To view the file's content, run thegcloud storage catcommand:
Based on the output of the file, the problem is a misplaced quote in"April 15, "1865.
Compressed CSV files
Debugging parsing errors is more challenging for compressed CSV files, since
the reported byte offset refers to the location in theuncompressedfile.
The followinggcloud storage catcommandstreams the file from Cloud Storage, decompresses the file, identifies
the appropriate byte offset, and prints the line with the format error:
(Optional) The separator for fields in a CSV file. The separator can be
any ISO-8859-1 single-byte character. BigQuery
converts the string to ISO-8859-1 encoding, and uses the first byte of
the encoded string to split the data in its raw, binary state.
BigQuery also supports the escape sequence "\t" to specify a
tab separator. The default value is a comma (`,`).
(Optional) This
controls the strategy used to match loaded columns to the schema.
Supported values include:
POSITION: matches by position. This option assumes that
the columns are ordered the same way as the schema.
NAME: matches by name. This option reads the header row
as column names and reorders columns to match the field names in the
schema. Column names are read from the last skipped row based on theskipLeadingRowsproperty.
If this value is unspecified, then the default is based on how the schema
is provided. If autodetect is enabled, then the default behavior is to
match columns by name. Otherwise, the default is to match columns by
position. This is done to keep the behavior backward-compatible.
(Optional) The maximum number of bad records that BigQuery
can ignore when running the job. If the number of bad records exceeds this
value, an invalid error is returned in the job result. The default value
is 0, which requires that all records are valid.
(Optional) Specifies a string that represents a null value in a CSV file.
For example, if you specify "\N", BigQuery interprets "\N"
as a null value when loading a CSV file. The default value is the empty
string. If you set this property to a custom value, BigQuery
throws an error if an empty string is present for all data types except
for STRING and BYTE. For STRING and BYTE columns, BigQuery
interprets the empty string as an empty value.
(Optional) Accept rows that are missing trailing optional columns. The
missing values are treated as nulls. If false, records with missing
trailing columns are treated as bad records, and if there are too many bad
records, an invalid error is returned in the job result. The default value
is false. Only applicable to CSV, ignored for other formats.
(Optional) Indicates if BigQuery should allow extra values
that are not represented in the table schema. If true, the extra values
are ignored. If false, records with extra columns are treated as bad
records, and if there are too many bad records, an invalid error is
returned in the job result. The default value is false. ThesourceFormatproperty determines what BigQuery
treats as an extra value:
CSV: Trailing columns
JSON: Named values that don't match any column names
Quote
Quote character: Double quote, Single quote, None, Custom
(Optional) The value that is used to quote data sections in a CSV file.
BigQuery converts the string to ISO-8859-1 encoding, and
then uses the first byte of the encoded string to split the data in its
raw, binary state. The default value is a double-quote ('"'). If your data
does not contain quoted sections, set the property value to an empty
string. If your data contains quoted newline characters, you must also set
theallowQuotedNewlinesproperty totrue. To
include the specific quote character within a quoted value, precede it
with an additional matching quote character. For example, if you want to
escape the default character ' " ', use ' "" '.
(Optional) The character encoding of the data. The supported values are
UTF-8, ISO-8859-1, UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE.
The default value is UTF-8. BigQuery
decodes the data after the raw, binary data has been split using the
values of thequoteandfieldDelimiterproperties.
ASCII control character
None
--preserve_ascii_control_characters
None
(Optional) If you want to allow ASCII 0 and other ASCII control
characters, then set--preserve_ascii_control_characterstotrueto your load jobs.
(Optional)
Default time zone that will apply when parsing timestamp values
that have no specific time zone. Checkvalid time zone names.
If this value is not present, the timestamp values without specific time
zone is parsed using default time zone UTC.
(Optional)Format elementsthat define how the DATE values are formatted in the input files (for
example,MM/DD/YYYY). If this value is present, this format is
the only compatible DATE format.Schema autodetectionwill also decide DATE column type based on this format instead of the
existing format. If this value is not present, the DATE field is parsed
with thedefault formats.
(Optional)Format elementsthat define how the DATETIME values are formatted in the input files (for
example,MM/DD/YYYY HH24:MI:SS.FF3). If this value is present,
this format is the only compatible DATETIME format.Schema autodetectionwill also decide DATETIME column type based on this format instead of the
existing format. If this value is not present, the DATETIME field is parsed
with thedefault formats.
(Optional)Format elementsthat define how the TIME values are formatted in the input files (for
example,HH24:MI:SS.FF3). If this value is present, this format
is the only compatible TIME format.Schema autodetectionwill also decide TIME column type based on this format instead of the
existing format. If this value is not present, the TIME field is parsed
with thedefault formats.
(Optional)Format elementsthat define how the TIMESTAMP values are formatted in the input files (for
example,MM/DD/YYYY HH24:MI:SS.FF3). If this value is present,
this format is the only compatible TIMESTAMP format.Schema autodetectionwill also decide TIMESTAMP column type based on this format instead of the
existing format. If this value is not present, the TIMESTAMP field is parsed
with thedefault formats.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2026-05-15 UTC."],[],[]]