Stay organized with collectionsSave and categorize content based on your preferences.
Loading CSV data from Cloud Storage
When you load CSV data from Cloud Storage, you can load the data into a new
table or partition, or you can append to or overwrite an existing table or
partition. When your data is loaded into BigQuery, it is
converted into columnar format forCapacitor(BigQuery's storage format).
When you load data from Cloud Storage into a BigQuery table,
the dataset that contains the table must be in the same regional or multi-
regional location as the Cloud Storage bucket.
If you're new to Google Cloud, create an account to evaluate how
BigQuery performs in real-world
scenarios. New customers also get $300 in free credits to run, test, and
deploy workloads.
You are subject to the following limitations when you load data into
BigQuery from a Cloud Storage bucket:
BigQuery does not guarantee data consistency for external data
sources. Changes to the underlying data while a query is running can result in
unexpected behavior.
BigQuery doesn't supportCloud Storage object versioning. If you
include a generation number in the Cloud Storage URI, then the load job
fails.
When you load CSV files into BigQuery, note the following:
CSV files don't support nested or repeated data.
Remove byte order mark (BOM) characters. They might cause unexpected issues.
If you use gzip compression,
BigQuery cannot read the data in parallel. Loading compressed
CSV data into BigQuery is slower than loading uncompressed data.
SeeLoading compressed and uncompressed data.
You cannot include both compressed and uncompressed files in the same load
job.
The maximum size for a gzip file is 4 GB.
Loading CSV datausing schema autodetectiondoes not automatically detect headers if all of the columns are string types.
In this case, add a numerical column to the input or declare the schema
explicitly.
When you load CSV or JSON data, values inDATEcolumns must use the dash
(-) separator and the date must be in the following format:YYYY-MM-DD(year-month-day).
When you load JSON or CSV data, values inTIMESTAMPcolumns
must use a dash (-) or slash (/) separator for the date portion of the
timestamp, and the date must be in one of the following formats:YYYY-MM-DD(year-month-day) orYYYY/MM/DD(year/month/day).
Thehh:mm:ss(hour-minute-second) portion of the timestamp must use a colon
(:) separator.
Your files must meet the CSV file size limits described in theload jobs limits.
Before you begin
Grant Identity and Access Management (IAM) roles that give users the necessary
permissions to perform each task in this document, and create a dataset
to store your data.
Required permissions
To load data into BigQuery, you need IAM permissions to run a load job and load data into BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need IAM permissions to access the bucket that contains your data.
Permissions to load data into BigQuery
To load data into a new BigQuery table or partition or to append or overwrite an existing table or partition, you need the following IAM permissions:
bigquery.tables.create
bigquery.tables.updateData
bigquery.tables.update
bigquery.jobs.create
Each of the following predefined IAM roles includes the permissions that you need in order to load data into a BigQuery table or partition:
To get the permissions that
you need to load data from a Cloud Storage bucket,
ask your administrator to grant you theStorage Admin(roles/storage.admin)
IAM role on the bucket.
For more information about granting roles, seeManage access to projects, folders, and organizations.
This predefined role contains
the permissions required to load data from a Cloud Storage bucket. To see the exact permissions that are
required, expand theRequired permissionssection:
Required permissions
The following permissions are required to load data from a Cloud Storage bucket:
storage.buckets.get
storage.objects.get
storage.objects.list (required if you are using a URIwildcard)
You can use thegziputility to compress CSV files. Note thatgzipperforms
full file compression, unlike the file content compression performed by
compression codecs for other file formats, such as Avro. Usinggzipto
compress your CSV files might have a performance impact; for more information
about the trade-offs, seeLoading compressed and uncompressed data.
Loading CSV data into a table
To load CSV data from Cloud Storage into a new BigQuery
table, select one of the following options:
Console
To follow step-by-step guidance for this task directly in the
Cloud Shell Editor, clickGuide me:
In theExplorerpane, expand your project, and then select a dataset.
In theDataset infosection, clickadd_boxCreate table.
In theCreate tablepanel, specify the following details:
In theSourcesection, selectGoogle Cloud Storagein theCreate table fromlist.
Then, do the following:
Select a file from the Cloud Storage bucket, or enter theCloud Storage URI.
You cannot include multiple URIs
in the Google Cloud console, butwildcardsare supported. The Cloud Storage bucket must be in the same
location as the dataset that contains the table you want to create, append, or
overwrite.
ForFile format, selectCSV.
In theDestinationsection, specify the following details:
ForDataset, select the dataset in which you want to create the
table.
In theTablefield, enter the name of the table that you want to create.
Verify that theTable typefield is set toNative table.
In theSchemasection, enter theschemadefinition.
To enable theauto detectionof a schema,
selectAuto detect.
You can enter schema information manually by using one of
the following methods:
Option 1: ClickEdit as textand paste the schema in the form of a
JSON array. When you use a JSON array, you generate the schema using the
same process ascreating a JSON schema file.
You can view the schema of an existing table in JSON format by entering the following
command:
ForWrite preference, leaveWrite if emptyselected. This
option creates a new table and loads your data into it.
ForNumber of errors allowed, accept the default value of0or
enter the maximum number of rows containing errors that can be ignored.
If the number of rows with errors exceeds this value, the job will
result in aninvalidmessage and fail. This option applies only to CSV and
JSON files.
ForTime zone, enter the default time zone that will apply when parsing timestamp
values that have no specific time zone. Checkherefor more
valid time zone names. If this value is not present, the timestamp values without specific
time zone is parsed using default time zone UTC.
(Preview).
ForDate Format, enter theformat elementsthat define how the DATE values are formatted in the input files. This
field expects SQL styles format (for example,MM/DD/YYYY). If this value is
present, this format is the only compatible DATE format.Schema autodetectionwill
also decide DATE column type based on this format instead of the existing format. If this
value is not present, the DATE field is parsed with thedefault formats.
(Preview).
ForDatetime Format, enter theformat elementsthat define how the DATETIME values are formatted in the input files.
This field expects SQL styles format (for example,MM/DD/YYYY HH24:MI:SS.FF3).
If this value is present, this format is the only compatible DATETIME format.Schema autodetectionwill
also decide DATETIME column type based on this format instead of the existing format. If
this value is not present, the DATETIME field is parsed with thedefault formats.
(Preview).
ForTime Format, enter theformat elementsthat define how the TIME values are formatted in the input files. This
field expects SQL styles format (for example,HH24:MI:SS.FF3). If this value is
present, this format is the only compatible TIME format.Schema autodetectionwill
also decide TIME column type based on this format instead of the existing format. If this
value is not present, the TIME field is parsed with thedefault formats.
(Preview).
ForTimestamp Format, enter theformat elementsthat define how the TIMESTAMP values are formatted in the input files.
This field expects SQL styles format (for example,MM/DD/YYYY HH24:MI:SS.FF3).
If this value is present, this format is the only compatible TIMESTAMP format.Schema autodetectionwill
also decide TIMESTAMP column type based on this format instead of the existing format. If
this value is not present, the TIMESTAMP field is parsed with thedefault formats.
(Preview).
If you want to ignore values in a row that are not present in the table's schema, then
selectUnknown values.
ForField delimiter, choose the character that separates the cells
in your CSV file:Comma,Tab,Pipe, orCustom. If you
chooseCustom, enter the delimiter in theCustom field delimiterbox. The default value isComma.
ForSource column match, choose one of the following strategies
used to match the loaded columns to the schema (Preview).
Default: Default behavior is chosen based on how the schema is provided.
If autodetect is enabled, then the default behavior is to match columns by name.
Otherwise, the default is to match columns by position. This is done to keep the
behavior backward-compatible.
Position: Matches columns by position, assuming that the columns are
ordered the same way as the schema.
Name: Matches by name by reading the header row as the column names and
reordering columns to match the field names in the schema. Column names are read from
the last skipped row based onHeader rows to skip.
ForHeader rows to skip, enter the number of header rows to skip
at the top of the CSV file. The default value is0.
ForQuoted newlines, checkAllow quoted newlinesto allow
quoted data sections that contain newline characters in a CSV file. The
default value isfalse.
ForJagged rows, checkAllow jagged rowsto accept rows in CSV
files that are missing trailing optional columns. The missing values are
treated as nulls. If unchecked, records with missing trailing columns
are treated as bad records, and if there are too many bad records, an
invalid error is returned in the job result. The default value isfalse.
ForNull markers, enter a list of custom strings that represents a NULL value
in CSV data. (Preview).
Use thebq loadcommand, specifyCSVusing the--source_formatflag, and include aCloud Storage URI.
You can include a single URI, a comma-separated list of URIs, or a URI
containing awildcard.
Supply the schema inline, in a schema definition file, or useschema auto-detect. If you don't specify a
schema, and--autodetectisfalse, and the destination
table exists, then the schema of the destination table is used.
(Optional) Supply the--locationflag and set the value to yourlocation.
Other optional flags include:
--allow_jagged_rows: When specified, accept rows in CSV files that are
missing trailing optional columns. The missing values are treated as nulls.
If unchecked, records with missing trailing columns are treated as bad
records, and if there are too many bad records, an invalid error is returned
in the job result. The default value isfalse.
--allow_quoted_newlines: When specified, allows quoted data sections
that contain newline characters in a CSV file. The default value isfalse.
--field_delimiter: The character that indicates the boundary between
columns in the data. Both\tandtabare allowed for tab delimiters.
The default value is,.
--null_marker: An optional custom string that represents a NULL value in
CSV data.
--null_markers: (Preview) An optional
comma-separated list of custom strings that represent NULL values in CSV
data. This option cannot be used with--null_markerflag.
--source_column_match: (Preview)
Specifies the strategy used to match loaded columns to the schema. You can
specifyPOSITIONto match loaded columns by position, assuming that the
columns are ordered the same way as the schema. You can also specifyNAMEto match by name by reading the header row as the column names and
reordering columns to match the field names in the schema. If this value is
unspecified, then the default is based on how the schema is provided. If--autodetectis enabled, then the default behavior is to match columns by
name. Otherwise, the default is to match columns by position.
--skip_leading_rows: Specifies the number of header rows to skip
at the top of the CSV file. The default value is0.
--quote: The quote character to use to enclose records. The default
value is". To indicate no quote character, use an empty string.
--max_bad_records: An integer that specifies the maximum number of bad
records allowed before the entire job fails. The default value is0. At
most, five errors of any type are returned regardless of the--max_bad_recordsvalue.
--ignore_unknown_values: When specified, allows and ignores extra,
unrecognized values in CSV or JSON data.
--time_zone: (Preview) An optional
default time zone that will apply when parsing timestamp values that have no
specific time zone in CSV or JSON data.
--date_format: (Preview) An optional
custom string that defines how the DATE values are formatted in CSV or JSON
data.
--datetime_format: (Preview) An
optional custom string that defines how the DATETIME values are formatted in
CSV or JSON data.
--time_format: (Preview) An optional
custom string that defines how the TIME values are formatted in CSV or JSON
data.
--timestamp_format: (Preview) An
optional custom string that defines how the TIMESTAMP values are formatted
in CSV or JSON data.
--autodetect: When specified, enable schema auto-detection for CSV and
JSON data.
--time_partitioning_type: Enables time-based partitioning on a table and
sets the partition type. Possible values areHOUR,DAY,MONTH, andYEAR. This flag is optional when you create a
table partitioned on aDATE,DATETIME, orTIMESTAMPcolumn. The
default partition type for time-based partitioning isDAY. You cannot
change the partitioning specification on an existing table.
--time_partitioning_expiration: An integer that specifies (in seconds)
when a time-based partition should be deleted. The expiration time evaluates
to the partition's UTC date plus the integer value.
--time_partitioning_field: TheDATEorTIMESTAMPcolumn used to
create a partitioned table. If time-based partitioning is enabled without
this value, an ingestion-time partitioned table is created.
--require_partition_filter: When enabled, this option requires users
to include aWHEREclause that specifies the partitions to query.
Requiring a partition filter may reduce cost and improve performance.
For more information, seeQuerying partitioned tables.
--clustering_fields: A comma-separated list of up to four column names
used to create aclustered table.
--destination_kms_key: The Cloud KMS key for encryption of the
table data.
--column_name_character_map: Defines the scope and handling of
characters in column names, with the option of enablingflexible column names.
Requires the--autodetectoption for CSV files.
For more information, seeload_option_list.
locationis your location. The--locationflag is optional.
For example, if you are using BigQuery in the Tokyo region,
you can set the flag's value toasia-northeast1. You can set a default
value for the location using the.bigqueryrc file.
formatisCSV.
datasetis an existing dataset.
tableis the name of the table into which you're loading data.
path_to_sourceis a fully-qualifiedCloud Storage URIor a comma-separated list of URIs.Wildcardsare also supported.
schemais a valid schema. The schema can be a local JSON file,
or it can be typed inline as part of the command. You can also use the--autodetectflag instead of supplying a schema definition.
Examples:
The following command loads data fromgs://mybucket/mydata.csvinto a
table namedmytableinmydataset. The schema is defined in a local
schema file namedmyschema.json.
The following command loads data fromgs://mybucket/mydata.csvinto a
table namedmytableinmydataset. The schema is defined in a local
schema file namedmyschema.json. The CSV file includes two header rows.
If--skip_leading_rowsis unspecified, the default behavior is to assume
the file does not contain headers.
The following command loads data fromgs://mybucket/mydata.csvinto an
ingestion-time partitioned table namedmytableinmydataset. The schema
is defined in a local schema file namedmyschema.json.
The following command loads data fromgs://mybucket/mydata.csvinto a new
partitioned table namedmytableinmydataset. The table is partitioned
on themytimestampcolumn. The schema is defined in a local schema file
namedmyschema.json.
The following command loads data fromgs://mybucket/mydata.csvinto a
table namedmytableinmydataset. The schema is defined inline in the
formatfield:data_type,field:data_type.
The following command loads data from multiple files ings://mybucket/into a table namedmytableinmydataset. The Cloud Storage URI uses a
wildcard. The schema is auto detected.
The following command loads data from multiple files ings://mybucket/into a table namedmytableinmydataset. The command includes a comma-
separated list of Cloud Storage URIs with wildcards. The schema is
defined in a local schema file namedmyschema.json.
Create aloadjob that points to the source data in Cloud Storage.
(Optional) Specify yourlocationin
thelocationproperty in thejobReferencesection of thejob resource.
Thesource URIsproperty must be fully-qualified, in the formatgs://bucket/object.
Each URI can contain one '*'wildcard character.
Specify the CSV data format by setting thesourceFormatproperty toCSV.
To check the job status, calljobs.get(job_id*),
wherejob_idis the ID of the job returned by the initial
request.
Ifstatus.state = DONE, the job completed successfully.
If thestatus.errorResultproperty is present, the request failed,
and that object will include information describing what went wrong.
When a request fails, no table is created and no data is loaded.
Ifstatus.errorResultis absent, the job finished successfully,
although there might have been some nonfatal errors, such as problems
importing a few rows. Nonfatal errors are listed in the returned job
object'sstatus.errorsproperty.
API notes:
Load jobs are atomic and consistent; if a load job fails, none of the data
is available, and if a load job succeeds, all of the data is available.
As a best practice, generate a unique ID and pass it asjobReference.jobIdwhen callingjobs.insertto create a load job. This
approach is more robust to network failure because the client can poll or
retry on the known job ID.
Callingjobs.inserton a given job ID is idempotent. You can retry as
many times as you like on the same job ID, and at most one of those
operations will succeed.
usingGoogle.Cloud.BigQuery.V2;usingSystem;publicclassBigQueryLoadTableGcsCsv{publicvoidLoadTableGcsCsv(stringprojectId="your-project-id",stringdatasetId="your_dataset_id"){BigQueryClientclient=BigQueryClient.Create(projectId);vargcsURI="gs://cloud-samples-data/bigquery/us-states/us-states.csv";vardataset=client.GetDataset(datasetId);varschema=newTableSchemaBuilder{{"name",BigQueryDbType.String},{"post_abbr",BigQueryDbType.String}}.Build();vardestinationTableRef=dataset.GetTableReference(tableId:"us_states");// Create job configurationvarjobOptions=newCreateLoadJobOptions(){// The source format defaults to CSV; line below is optional.SourceFormat=FileFormat.Csv,SkipLeadingRows=1};// Create and run jobvarloadJob=client.CreateLoadJob(sourceUri:gcsURI,destination:destinationTableRef,schema:schema,options:jobOptions);loadJob=loadJob.PollUntilCompleted().ThrowOnAnyError();// Waits for the job to complete.// Display the number of rows uploadedBigQueryTabletable=client.GetTable(destinationTableRef);Console.WriteLine($"Loaded {table.Resource.NumRows} rows to {table.FullyQualifiedId}");}}
import("context""fmt""cloud.google.com/go/bigquery")// importCSVExplicitSchema demonstrates loading CSV data from Cloud Storage into a BigQuery// table and providing an explicit schema for the data.funcimportCSVExplicitSchema(projectID,datasetID,tableIDstring)error{// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %v",err)}deferclient.Close()gcsRef:=bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states.csv")gcsRef.SkipLeadingRows=1gcsRef.Schema=bigquery.Schema{{Name:"name",Type:bigquery.StringFieldType},{Name:"post_abbr",Type:bigquery.StringFieldType},}loader:=client.Dataset(datasetID).Table(tableID).LoaderFrom(gcsRef)loader.WriteDisposition=bigquery.WriteEmptyjob,err:=loader.Run(ctx)iferr!=nil{returnerr}status,err:=job.Wait(ctx)iferr!=nil{returnerr}ifstatus.Err()!=nil{returnfmt.Errorf("job completed with error: %v",status.Err())}returnnil}
importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.CsvOptions;importcom.google.cloud.bigquery.Field;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.LoadJobConfiguration;importcom.google.cloud.bigquery.Schema;importcom.google.cloud.bigquery.StandardSQLTypeName;importcom.google.cloud.bigquery.TableId;// Sample to load CSV data from Cloud Storage into a new BigQuery tablepublicclassLoadCsvFromGcs{publicstaticvoidrunLoadCsvFromGcs()throwsException{// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";StringsourceUri="gs://cloud-samples-data/bigquery/us-states/us-states.csv";Schemaschema=Schema.of(Field.of("name",StandardSQLTypeName.STRING),Field.of("post_abbr",StandardSQLTypeName.STRING));loadCsvFromGcs(datasetName,tableName,sourceUri,schema);}publicstaticvoidloadCsvFromGcs(StringdatasetName,StringtableName,StringsourceUri,Schemaschema){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();// Skip header row in the file.CsvOptionscsvOptions=CsvOptions.newBuilder().setSkipLeadingRows(1).build();TableIdtableId=TableId.of(datasetName,tableName);LoadJobConfigurationloadConfig=LoadJobConfiguration.newBuilder(tableId,sourceUri,csvOptions).setSchema(schema).build();// Load data from a GCS CSV file into the tableJobjob=bigquery.create(JobInfo.of(loadConfig));// Blocks until this load table job completes its execution, either failing or succeeding.job=job.waitFor();if(job.isDone()){System.out.println("CSV from GCS successfully added during load append job");}else{System.out.println("BigQuery was unable to load into the table due to an error:"+job.getStatus().getError());}}catch(BigQueryException|InterruptedExceptione){System.out.println("Column not added during load append \n"+e.toString());}}}
// Import the Google Cloud client librariesconst{BigQuery}=require('@google-cloud/bigquery');const{Storage}=require('@google-cloud/storage');// Instantiate clientsconstbigquery=newBigQuery();conststorage=newStorage();/*** This sample loads the CSV file at* https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.csv** TODO(developer): Replace the following lines with the path to your file.*/constbucketName='cloud-samples-data';constfilename='bigquery/us-states/us-states.csv';asyncfunctionloadCSVFromGCS(){// Imports a GCS file into a table with manually defined schema./*** TODO(developer): Uncomment the following lines before running the sample.*/// const datasetId = 'my_dataset';// const tableId = 'my_table';// Configure the load job. For full list of options, see:// https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoadconstmetadata={sourceFormat:'CSV',skipLeadingRows:1,schema:{fields:[{name:'name',type:'STRING'},{name:'post_abbr',type:'STRING'},],},location:'US',};// Load data from a Google Cloud Storage file into the tableconst[job]=awaitbigquery.dataset(datasetId).table(tableId).load(storage.bucket(bucketName).file(filename),metadata);// load() waits for the job to finishconsole.log(`Job${job.id}completed.`);// Check the job's status for errorsconsterrors=job.status.errors;if(errors&&errors.length>0){throwerrors;}}
use Google\Cloud\BigQuery\BigQueryClient;use Google\Cloud\Core\ExponentialBackoff;/** Uncomment and populate these variables in your code */// $projectId = 'The Google project ID';// $datasetId = 'The BigQuery dataset ID';// instantiate the bigquery table service$bigQuery = new BigQueryClient(['projectId' => $projectId,]);$dataset = $bigQuery->dataset($datasetId);$table = $dataset->table('us_states');// create the import job$gcsUri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv';$schema = ['fields' => [['name' => 'name', 'type' => 'string'],['name' => 'post_abbr', 'type' => 'string']]];$loadConfig = $table->loadFromStorage($gcsUri)->schema($schema)->skipLeadingRows(1);$job = $table->runJob($loadConfig);// poll the job until it is complete$backoff = new ExponentialBackoff(10);$backoff->execute(function () use ($job) {print('Waiting for job to complete' . PHP_EOL);$job->reload();if (!$job->isComplete()) {throw new Exception('Job has not yet completed', 500);}});// check if the job has errorsif (isset($job->info()['status']['errorResult'])) {$error = $job->info()['status']['errorResult']['message'];printf('Error running job: %s' . PHP_EOL, $error);} else {print('Data imported successfully' . PHP_EOL);}
fromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_name"job_config=bigquery.LoadJobConfig(schema=[bigquery.SchemaField("name","STRING"),bigquery.SchemaField("post_abbr","STRING"),],skip_leading_rows=1,# The source format defaults to CSV, so the line below is optional.source_format=bigquery.SourceFormat.CSV,)uri="gs://cloud-samples-data/bigquery/us-states/us-states.csv"load_job=client.load_table_from_uri(uri,table_id,job_config=job_config)# Make an API request.load_job.result()# Waits for the job to complete.destination_table=client.get_table(table_id)# Make an API request.print("Loaded{}rows.".format(destination_table.num_rows))
require"google/cloud/bigquery"defload_table_gcs_csvdataset_id="your_dataset_id"bigquery=Google::Cloud::Bigquery.newdataset=bigquery.datasetdataset_idgcs_uri="gs://cloud-samples-data/bigquery/us-states/us-states.csv"table_id="us_states"load_job=dataset.load_jobtable_id,gcs_uri,skip_leading:1do|schema|schema.string"name"schema.string"post_abbr"endputs"Starting job#{load_job.job_id}"load_job.wait_until_done!# Waits for table load to complete.puts"Job finished."table=dataset.tabletable_idputs"Loaded#{table.rows_count}rows to table#{table.id}"end
Loading CSV data into a table that uses column-based time partitioning
To load CSV data from Cloud Storage into a BigQuery table
that uses column-based time partitioning:
import("context""fmt""time""cloud.google.com/go/bigquery")// importPartitionedTable demonstrates specifing time partitioning for a BigQuery table when loading// CSV data from Cloud Storage.funcimportPartitionedTable(projectID,destDatasetID,destTableIDstring)error{// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %v",err)}deferclient.Close()gcsRef:=bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states-by-date.csv")gcsRef.SkipLeadingRows=1gcsRef.Schema=bigquery.Schema{{Name:"name",Type:bigquery.StringFieldType},{Name:"post_abbr",Type:bigquery.StringFieldType},{Name:"date",Type:bigquery.DateFieldType},}loader:=client.Dataset(destDatasetID).Table(destTableID).LoaderFrom(gcsRef)loader.TimePartitioning=&bigquery.TimePartitioning{Field:"date",Expiration:90*24*time.Hour,}loader.WriteDisposition=bigquery.WriteEmptyjob,err:=loader.Run(ctx)iferr!=nil{returnerr}status,err:=job.Wait(ctx)iferr!=nil{returnerr}ifstatus.Err()!=nil{returnfmt.Errorf("job completed with error: %v",status.Err())}returnnil}
importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.Field;importcom.google.cloud.bigquery.FormatOptions;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobId;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.LoadJobConfiguration;importcom.google.cloud.bigquery.Schema;importcom.google.cloud.bigquery.StandardSQLTypeName;importcom.google.cloud.bigquery.TableId;importcom.google.cloud.bigquery.TimePartitioning;importjava.time.Duration;importjava.time.temporal.ChronoUnit;importjava.util.UUID;publicclassLoadPartitionedTable{publicstaticvoidrunLoadPartitionedTable()throwsException{// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";StringsourceUri="/path/to/file.csv";loadPartitionedTable(datasetName,tableName,sourceUri);}publicstaticvoidloadPartitionedTable(StringdatasetName,StringtableName,StringsourceUri)throwsException{try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();TableIdtableId=TableId.of(datasetName,tableName);Schemaschema=Schema.of(Field.of("name",StandardSQLTypeName.STRING),Field.of("post_abbr",StandardSQLTypeName.STRING),Field.of("date",StandardSQLTypeName.DATE));// Configure time partitioning. For full list of options, see:// https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TimePartitioningTimePartitioningpartitioning=TimePartitioning.newBuilder(TimePartitioning.Type.DAY).setField("date").setExpirationMs(Duration.of(90,ChronoUnit.DAYS).toMillis()).build();LoadJobConfigurationloadJobConfig=LoadJobConfiguration.builder(tableId,sourceUri).setFormatOptions(FormatOptions.csv()).setSchema(schema).setTimePartitioning(partitioning).build();// Create a job ID so that we can safely retry.JobIdjobId=JobId.of(UUID.randomUUID().toString());JobloadJob=bigquery.create(JobInfo.newBuilder(loadJobConfig).setJobId(jobId).build());// Load data from a GCS parquet file into the table// Blocks until this load table job completes its execution, either failing or succeeding.JobcompletedJob=loadJob.waitFor();// Check for errorsif(completedJob==null){thrownewException("Job not executed since it no longer exists.");}elseif(completedJob.getStatus().getError()!=null){// You can also look at queryJob.getStatus().getExecutionErrors() for all// errors, not just the latest one.thrownewException("BigQuery was unable to load into the table due to an error: \n"+loadJob.getStatus().getError());}System.out.println("Data successfully loaded into time partitioned table during load job");}catch(BigQueryException|InterruptedExceptione){System.out.println("Data not loaded into time partitioned table during load job \n"+e.toString());}}}
// Import the Google Cloud client librariesconst{BigQuery}=require('@google-cloud/bigquery');const{Storage}=require('@google-cloud/storage');// Instantiate clientsconstbigquery=newBigQuery();conststorage=newStorage();/*** This sample loads the CSV file at* https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.csv** TODO(developer): Replace the following lines with the path to your file.*/constbucketName='cloud-samples-data';constfilename='bigquery/us-states/us-states-by-date.csv';asyncfunctionloadTablePartitioned(){// Load data into a table that uses column-based time partitioning./*** TODO(developer): Uncomment the following lines before running the sample.*/// const datasetId = 'my_dataset';// const tableId = 'my_new_table';// Configure the load job. For full list of options, see:// https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoadconstpartitionConfig={type:'DAY',expirationMs:'7776000000',// 90 daysfield:'date',};constmetadata={sourceFormat:'CSV',skipLeadingRows:1,schema:{fields:[{name:'name',type:'STRING'},{name:'post_abbr',type:'STRING'},{name:'date',type:'DATE'},],},location:'US',timePartitioning:partitionConfig,};// Load data from a Google Cloud Storage file into the tableconst[job]=awaitbigquery.dataset(datasetId).table(tableId).load(storage.bucket(bucketName).file(filename),metadata);// load() waits for the job to finishconsole.log(`Job${job.id}completed.`);}
fromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_name"job_config=bigquery.LoadJobConfig(schema=[bigquery.SchemaField("name","STRING"),bigquery.SchemaField("post_abbr","STRING"),bigquery.SchemaField("date","DATE"),],skip_leading_rows=1,time_partitioning=bigquery.TimePartitioning(type_=bigquery.TimePartitioningType.DAY,field="date",# Name of the column to use for partitioning.expiration_ms=7776000000,# 90 days.),)uri="gs://cloud-samples-data/bigquery/us-states/us-states-by-date.csv"load_job=client.load_table_from_uri(uri,table_id,job_config=job_config)# Make an API request.load_job.result()# Wait for the job to complete.table=client.get_table(table_id)print("Loaded{}rows to table{}".format(table.num_rows,table_id))
Appending to or overwriting a table with CSV data
You can load additional data into a table either from source files or by
appending query results.
In the Google Cloud console, use theWrite preferenceoption to specify
what action to take when you load data from a source file or from a query
result.
You have the following options when you load additional data into a table:
Console option
bq tool flag
BigQuery API property
Description
Write if empty
Not supported
WRITE_EMPTY
Writes the data only if the table is empty.
Append to table
--noreplaceor--replace=false; if--[no]replaceis unspecified, the default is append
WRITE_APPEND
(Default)
Appends the data to the end of the table.
Overwrite table
--replaceor--replace=true
WRITE_TRUNCATE
Erases all existing data in a table before writing the new data.
This action also deletes the table schema, row level security, and removes any
Cloud KMS key.
If you load data into an existing table, the load job can append the data or
overwrite the table.
Console
In the Google Cloud console, go to theBigQuerypage.
In theExplorerpane, expand your project, and then select a dataset.
In theDataset infosection, clickadd_boxCreate table.
In theCreate tablepanel, specify the following details:
In theSourcesection, selectGoogle Cloud Storagein theCreate table fromlist.
Then, do the following:
Select a file from the Cloud Storage bucket, or enter theCloud Storage URI.
You cannot include multiple URIs
in the Google Cloud console, butwildcardsare supported. The Cloud Storage bucket must be in the same
location as the dataset that contains the table you want to create, append, or
overwrite.
ForFile format, selectCSV.
In theDestinationsection, specify the following details:
ForDataset, select the dataset in which you want to create the
table.
In theTablefield, enter the name of the table that you want to create.
Verify that theTable typefield is set toNative table.
In theSchemasection, enter theschemadefinition.
To enable theauto detectionof a schema,
selectAuto detect.
You can enter schema information manually by using one of
the following methods:
Option 1: ClickEdit as textand paste the schema in the form of a
JSON array. When you use a JSON array, you generate the schema using the
same process ascreating a JSON schema file.
You can view the schema of an existing table in JSON format by entering the following
command:
Option 2: Clickadd_boxAdd fieldand enter the table schema. Specify each field'sName,Type,
andMode.
Optional: SpecifyPartition and cluster settings. For more information, seeCreating partitioned tablesandCreating and using clustered tables.
You
cannot convert a table to a partitioned or clustered table by appending or
overwriting it. The Google Cloud console does not support
appending to or overwriting partitioned or clustered tables in a load job.
ClickAdvanced optionsand do the following:
ForWrite preference, chooseAppend to tableorOverwrite
table.
ForNumber of errors allowed, accept the default value of0or
enter the maximum number of rows containing errors that can be ignored.
If the number of rows with errors exceeds this value, the job will
result in aninvalidmessage and fail. This option applies only to CSV and
JSON files.
ForTime zone, enter the default time zone that will apply when parsing timestamp
values that have no specific time zone. Checkherefor more
valid time zone names. If this value is not present, the timestamp values without specific
time zone is parsed using default time zone UTC.
(Preview).
ForDate Format, enter theformat elementsthat define how the DATE values are formatted in the input files. This
field expects SQL styles format (for example,MM/DD/YYYY). If this value is
present, this format is the only compatible DATE format.Schema autodetectionwill
also decide DATE column type based on this format instead of the existing format. If this
value is not present, the DATE field is parsed with thedefault formats.
(Preview).
ForDatetime Format, enter theformat elementsthat define how the DATETIME values are formatted in the input files.
This field expects SQL styles format (for example,MM/DD/YYYY HH24:MI:SS.FF3).
If this value is present, this format is the only compatible DATETIME format.Schema autodetectionwill
also decide DATETIME column type based on this format instead of the existing format. If
this value is not present, the DATETIME field is parsed with thedefault formats.
(Preview).
ForTime Format, enter theformat elementsthat define how the TIME values are formatted in the input files. This
field expects SQL styles format (for example,HH24:MI:SS.FF3). If this value is
present, this format is the only compatible TIME format.Schema autodetectionwill
also decide TIME column type based on this format instead of the existing format. If this
value is not present, the TIME field is parsed with thedefault formats.
(Preview).
ForTimestamp Format, enter theformat elementsthat define how the TIMESTAMP values are formatted in the input files.
This field expects SQL styles format (for example,MM/DD/YYYY HH24:MI:SS.FF3).
If this value is present, this format is the only compatible TIMESTAMP format.Schema autodetectionwill
also decide TIMESTAMP column type based on this format instead of the existing format. If
this value is not present, the TIMESTAMP field is parsed with thedefault formats.
(Preview).
If you want to ignore values in a row that are not present in the table's schema, then
selectUnknown values.
ForField delimiter, choose the character that separates the cells
in your CSV file:Comma,Tab,Pipe, orCustom. If you
chooseCustom, enter the delimiter in theCustom field delimiterbox. The default value isComma.
ForSource column match, choose one of the following strategies
used to match the loaded columns to the schema (Preview).
Default: Default behavior is chosen based on how the schema is provided.
If autodetect is enabled, then the default behavior is to match columns by name.
Otherwise, the default is to match columns by position. This is done to keep the
behavior backward-compatible.
Position: Matches columns by position, assuming that the columns are
ordered the same way as the schema.
Name: Matches by name by reading the header row as the column names and
reordering columns to match the field names in the schema. Column names are read from
the last skipped row based onHeader rows to skip.
ForHeader rows to skip, enter the number of header rows to skip
at the top of the CSV file. The default value is0.
ForQuoted newlines, checkAllow quoted newlinesto allow
quoted data sections that contain newline characters in a CSV file. The
default value isfalse.
ForJagged rows, checkAllow jagged rowsto accept rows in CSV
files that are missing trailing optional columns. The missing values are
treated as nulls. If unchecked, records with missing trailing columns
are treated as bad records, and if there are too many bad records, an
invalid error is returned in the job result. The default value isfalse.
ForNull markers, enter a list of custom strings that represents a NULL value
in CSV data. (Preview).
Use thebq loadcommand, specifyCSVusing the--source_formatflag, and include aCloud Storage URI.
You can include a single URI, a comma-separated list of URIs, or a URI
containing awildcard.
Supply the schema inline, in a schema definition file, or useschema auto-detect. If you don't specify a
schema, and--autodetectisfalse, and the destination
table exists, then the schema of the destination table is used.
Specify the--replaceflag to overwrite the
table. Use the--noreplaceflag to append data to the table. If no flag is
specified, the default is to append data.
It is possible to modify the table's schema when you append or
overwrite it. For more information on supported schema changes during a load
operation, seeModifying table schemas.
(Optional) Supply the--locationflag and set the value to yourlocation.
Other optional flags include:
--allow_jagged_rows: When specified, accept rows in CSV files that are
missing trailing optional columns. The missing values are treated as nulls.
If unchecked, records with missing trailing columns are treated as bad
records, and if there are too many bad records, an invalid error is returned
in the job result. The default value isfalse.
--allow_quoted_newlines: When specified, allows quoted data sections
that contain newline characters in a CSV file. The default value isfalse.
--field_delimiter: The character that indicates the boundary between
columns in the data. Both\tandtabare allowed for tab delimiters.
The default value is,.
--null_marker: An optional custom string that represents a NULL value in
CSV data.
--null_markers: (Preview) An optional
comma-separated list of custom strings that represent NULL values in CSV
data. This option cannot be used with--null_markerflag.
--source_column_match: (Preview)
Specifies the strategy used to match loaded columns to the schema. You can
specifyPOSITIONto match loaded columns by position, assuming that the
columns are ordered the same way as the schema. You can also specifyNAMEto match by name by reading the header row as the column names and
reordering columns to match the field names in the schema. If this value is
unspecified, then the default is based on how the schema is provided. If--autodetectis enabled, then the default behavior is to match columns by
name. Otherwise, the default is to match columns by position.
--skip_leading_rows: Specifies the number of header rows to skip
at the top of the CSV file. The default value is0.
--quote: The quote character to use to enclose records. The default
value is". To indicate no quote character, use an empty string.
--max_bad_records: An integer that specifies the maximum number of bad
records allowed before the entire job fails. The default value is0. At
most, five errors of any type are returned regardless of the--max_bad_recordsvalue.
--ignore_unknown_values: When specified, allows and ignores extra,
unrecognized values in CSV or JSON data.
--time_zone: (Preview) An optional
default time zone that will apply when parsing timestamp values that have no
specific time zone in CSV or JSON data.
--date_format: (Preview) An optional
custom string that defines how the DATE values are formatted in CSV or JSON
data.
--datetime_format: (Preview) An
optional custom string that defines how the DATETIME values are formatted in
CSV or JSON data.
--time_format: (Preview) An optional
custom string that defines how the TIME values are formatted in CSV or JSON
data.
--timestamp_format: (Preview) An
optional custom string that defines how the TIMESTAMP values are formatted
in CSV or JSON data.
--autodetect: When specified, enable schema auto-detection for CSV and
JSON data.
--destination_kms_key: The Cloud KMS key for encryption of the
table data.
locationis yourlocation.
The--locationflag is optional. You can set a default value for the
location using the.bigqueryrc file.
formatisCSV.
datasetis an existing dataset.
tableis the name of the table into which you're loading data.
path_to_sourceis a fully-qualifiedCloud Storage URIor a comma-separated list of URIs.Wildcardsare also supported.
schemais a valid schema. The schema can be a local JSON file,
or it can be typed inline as part of the command. You can also use the--autodetectflag instead of supplying a schema definition.
Examples:
The following command loads data fromgs://mybucket/mydata.csvand
overwrites a table namedmytableinmydataset. The schema is defined
usingschema auto-detection.
The following command loads data fromgs://mybucket/mydata.csvand
appends data to a table namedmytableinmydataset. The schema is
defined using a JSON schema file —myschema.json.
Create aloadjob that points to the source data in Cloud Storage.
(Optional) Specify yourlocationin
thelocationproperty in thejobReferencesection of thejob resource.
Thesource URIsproperty
must be fully-qualified, in the formatgs://bucket/object. You can
include multiple URIs as a comma-separated list. Note thatwildcardsare
also supported.
Specify the data format by setting theconfiguration.load.sourceFormatproperty toCSV.
Specify the write preference by setting theconfiguration.load.writeDispositionproperty toWRITE_TRUNCATEorWRITE_APPEND.
import("context""fmt""cloud.google.com/go/bigquery")// importCSVTruncate demonstrates loading data from CSV data in Cloud Storage and overwriting/truncating// data in the existing table.funcimportCSVTruncate(projectID,datasetID,tableIDstring)error{// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %v",err)}deferclient.Close()gcsRef:=bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states.csv")gcsRef.SourceFormat=bigquery.CSVgcsRef.AutoDetect=truegcsRef.SkipLeadingRows=1loader:=client.Dataset(datasetID).Table(tableID).LoaderFrom(gcsRef)loader.WriteDisposition=bigquery.WriteTruncatejob,err:=loader.Run(ctx)iferr!=nil{returnerr}status,err:=job.Wait(ctx)iferr!=nil{returnerr}ifstatus.Err()!=nil{returnfmt.Errorf("job completed with error: %v",status.Err())}returnnil}
importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.FormatOptions;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.JobInfo.WriteDisposition;importcom.google.cloud.bigquery.LoadJobConfiguration;importcom.google.cloud.bigquery.TableId;// Sample to overwrite the BigQuery table data by loading a CSV file from GCSpublicclassLoadCsvFromGcsTruncate{publicstaticvoidrunLoadCsvFromGcsTruncate()throwsException{// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";StringsourceUri="gs://cloud-samples-data/bigquery/us-states/us-states.csv";loadCsvFromGcsTruncate(datasetName,tableName,sourceUri);}publicstaticvoidloadCsvFromGcsTruncate(StringdatasetName,StringtableName,StringsourceUri)throwsException{try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();TableIdtableId=TableId.of(datasetName,tableName);LoadJobConfigurationconfiguration=LoadJobConfiguration.builder(tableId,sourceUri).setFormatOptions(FormatOptions.csv())// Set the write disposition to overwrite existing table data.setWriteDisposition(WriteDisposition.WRITE_TRUNCATE).build();// For more information on Job see:// https://googleapis.dev/java/google-cloud-clients/latest/index.html?com/google/cloud/bigquery/package-summary.html// Load the tableJobloadJob=bigquery.create(JobInfo.of(configuration));// Load data from a GCS parquet file into the table// Blocks until this load table job completes its execution, either failing or succeeding.JobcompletedJob=loadJob.waitFor();// Check for errorsif(completedJob==null){thrownewException("Job not executed since it no longer exists.");}elseif(completedJob.getStatus().getError()!=null){// You can also look at queryJob.getStatus().getExecutionErrors() for all// errors, not just the latest one.thrownewException("BigQuery was unable to load into the table due to an error: \n"+loadJob.getStatus().getError());}System.out.println("Table is successfully overwritten by CSV file loaded from GCS");}catch(BigQueryException|InterruptedExceptione){System.out.println("Column not added during load append \n"+e.toString());}}}
To replace the rows in an existing table, set thewriteDispositionvalue in themetadataparameter to'WRITE_TRUNCATE'.
// Import the Google Cloud client librariesconst{BigQuery}=require('@google-cloud/bigquery');const{Storage}=require('@google-cloud/storage');// Instantiate clientsconstbigquery=newBigQuery();conststorage=newStorage();/*** This sample loads the CSV file at* https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.csv** TODO(developer): Replace the following lines with the path to your file.*/constbucketName='cloud-samples-data';constfilename='bigquery/us-states/us-states.csv';asyncfunctionloadCSVFromGCSTruncate(){/*** Imports a GCS file into a table and overwrites* table data if table already exists.*//*** TODO(developer): Uncomment the following lines before running the sample.*/// const datasetId = 'my_dataset';// const tableId = 'my_table';// Configure the load job. For full list of options, see:// https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoadconstmetadata={sourceFormat:'CSV',skipLeadingRows:1,schema:{fields:[{name:'name',type:'STRING'},{name:'post_abbr',type:'STRING'},],},// Set the write disposition to overwrite existing table data.writeDisposition:'WRITE_TRUNCATE',location:'US',};// Load data from a Google Cloud Storage file into the tableconst[job]=awaitbigquery.dataset(datasetId).table(tableId).load(storage.bucket(bucketName).file(filename),metadata);// load() waits for the job to finishconsole.log(`Job${job.id}completed.`);// Check the job's status for errorsconsterrors=job.status.errors;if(errors&&errors.length>0){throwerrors;}}
use Google\Cloud\BigQuery\BigQueryClient;use Google\Cloud\Core\ExponentialBackoff;/** Uncomment and populate these variables in your code */// $projectId = 'The Google project ID';// $datasetId = 'The BigQuery dataset ID';// $tableId = 'The BigQuery table ID';// instantiate the bigquery table service$bigQuery = new BigQueryClient(['projectId' => $projectId,]);$table = $bigQuery->dataset($datasetId)->table($tableId);// create the import job$gcsUri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv';$loadConfig = $table->loadFromStorage($gcsUri)->skipLeadingRows(1)->writeDisposition('WRITE_TRUNCATE');$job = $table->runJob($loadConfig);// poll the job until it is complete$backoff = new ExponentialBackoff(10);$backoff->execute(function () use ($job) {print('Waiting for job to complete' . PHP_EOL);$job->reload();if (!$job->isComplete()) {throw new Exception('Job has not yet completed', 500);}});// check if the job has errorsif (isset($job->info()['status']['errorResult'])) {$error = $job->info()['status']['errorResult']['message'];printf('Error running job: %s' . PHP_EOL, $error);} else {print('Data imported successfully' . PHP_EOL);}
importsixfromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_namejob_config=bigquery.LoadJobConfig(schema=[bigquery.SchemaField("name","STRING"),bigquery.SchemaField("post_abbr","STRING"),],)body=six.BytesIO(b"Washington,WA")client.load_table_from_file(body,table_id,job_config=job_config).result()previous_rows=client.get_table(table_id).num_rowsassertprevious_rows>0job_config=bigquery.LoadJobConfig(write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE,source_format=bigquery.SourceFormat.CSV,skip_leading_rows=1,)uri="gs://cloud-samples-data/bigquery/us-states/us-states.csv"load_job=client.load_table_from_uri(uri,table_id,job_config=job_config)# Make an API request.load_job.result()# Waits for the job to complete.destination_table=client.get_table(table_id)print("Loaded{}rows.".format(destination_table.num_rows))
Loading hive-partitioned CSV data
BigQuery supports loading hive-partitioned CSV data stored on
Cloud Storage and will populate the hive partitioning columns as columns in
the destination BigQuery managed table. For more information, seeLoading Externally Partitioned Data from Cloud Storage.
Details of loading CSV data
This section describes how BigQuery handles various CSV formatting
options.
Encoding
BigQuery expects CSV data to be UTF-8 encoded. If you have
CSV files with other supported encoding types, you should explicitly specify the
encoding so that BigQuery can properly convert the data to UTF-8.
BigQuery supports the following encoding types for CSV files:
UTF-8
ISO-8859-1
UTF-16BE (UTF-16 Big Endian)
UTF-16LE (UTF-16 Little Endian)
UTF-32BE (UTF-32 Big Endian)
UTF-32LE (UTF-32 Little Endian)
If you don't specify an encoding, or if you specify UTF-8 encoding when the CSV
file is not UTF-8 encoded, BigQuery attempts to convert the data
to UTF-8. Generally, if the CSV file is ISO-8859-1 encoded, your data will be
loaded successfully, but it may not exactly match what you expect. If the CSV
file is UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE encoded, the load might fail.
To avoid unexpected failures, specify the correct encoding by using the--encodingflag.
If BigQuery can't convert a character other than the ASCII0character, BigQuery converts the character to the standard
Unicode replacement character: �.
Field delimiters
Delimiters in CSV files can be any single-byte character. If the source file
uses ISO-8859-1 encoding, any character can be a delimiter. If the source file
uses UTF-8 encoding, any character in the decimal range 1-127 (U+0001-U+007F)
can be used without modification. You can insert an ISO-8859-1 character outside
of this range as a delimiter, and BigQuery will interpret it
correctly. However, if you use a multibyte character as a delimiter, some of the
bytes will be interpreted incorrectly as part of the field value.
Generally, it's a best practice to use a standard delimiter, such as a tab,
pipe, or comma. The default is a comma.
Data types
Boolean. BigQuery can parse any of the following pairs for
Boolean data: 1 or 0, true or false, t or f, yes or no, or y or n (all case
insensitive). Schemaautodetectionautomatically detects any of these except 0 and 1.
Bytes. Columns with BYTES types must be encoded as Base64.
Date. Columns with DATE types must be in the formatYYYY-MM-DD.
Datetime. Columns with DATETIME types must be in the formatYYYY-MM-DD
HH:MM:SS[.SSSSSS].
Geography. Columns with GEOGRAPHY types must contain strings in one of the
following formats:
Interval. Columns withINTERVALtypes must be in the formatY-M D H:M:S[.F], where:
Y = Year. Supported range is 0-10,000.
M = Month. Supported range is 1-12.
D = Day. Supported range is 1-[last day of the indicated month].
H = Hour.
M = Minute.
S = Second.
[.F] = Fractions of a second up to six digits, with microsecond precision.
You can indicate a negative value by prepending a dash (-).
The following list shows examples of valid data:
10-6 0 0:0:0
0-0 -5 0:0:0
0-0 0 0:0:1.25
To load INTERVAL data, you must use thebq loadcommand and use
the--schemaflag to specify a schema. You can't upload INTERVAL data by using
the console.
JSON. Quotes are escaped by using the two character sequence"". For more
information, see an example ofloading JSON data from a CSV file
Time. Columns with TIME types must be in the formatHH:MM:SS[.SSSSSS].
Timestamp. BigQuery accepts various timestamp formats.
The timestamp must include a date portion and a time portion.
The date portion can be formatted asYYYY-MM-DDorYYYY/MM/DD.
The timestamp portion must be formatted asHH:MM[:SS[.SSSSSS]](seconds and
fractions of seconds are optional).
The date and time must be separated by a space or 'T'.
Optionally, the date and time can be followed by a UTC offset or the UTC zone
designator (Z). For more information, seeTime zones.
For example, any of the following are valid timestamp values:
2018-08-19 12:11
2018-08-19 12:11:35
2018-08-19 12:11:35.22
2018/08/19 12:11
2018-07-05 12:54:00 UTC
2018-08-19 07:11:35.220 -05:00
2018-08-19T12:11:35.220Z
If you provide a schema, BigQuery also accepts Unix epoch time for
timestamp values. However, schema autodetection doesn't detect this case, and
treats the value as a numeric or string type instead.
Examples of Unix epoch timestamp values:
1534680695
1.534680695e12
RANGE. Represented in CSV files in the format[LOWER_BOUND,UPPER_BOUND),
whereLOWER_BOUNDandUPPER_BOUNDare validDATE,DATETIME, orTIMESTAMPstrings.NULLandUNBOUNDEDrepresent unbounded start or end values.
The following are example of CSV values forRANGE<DATE>:
BigQuery infers headers by comparing the first row of the file
with other rows in the file. If the first line contains only strings, and the
other lines contain other data types, BigQuery assumes that the
first row is a header row. BigQuery assigns column names based on the field names in the header row. The names might be modified to meet thenaming rulesfor columns in BigQuery. For example, spaces will be replaced with underscores.
Otherwise, BigQuery assumes the first row is a data row, and
assigns generic column names such asstring_field_1. Note that after a table
is created, the column names cannot be updated in the schema, although you canchange the names
manuallyafter the table is created. Another option is to provide an explicit schema
instead of using autodetect.
You might have a CSV file with a header row, where all of the data fields are
strings. In that case, BigQuery will not automatically detect that
the first row is a header. Use the--skip_leading_rowsoption to skip the
header row. Otherwise, the header will be imported as data. Also consider
providing an explicit schema in this case, so that you can assign column names.
CSV quoted new lines
BigQuery detects quoted new line characters within a CSV field
and does not interpret the quoted new line character as a row boundary.
Troubleshoot parsing errors
If there's a problem parsing your CSV files, then the
load job'serrorsresource is
populated with the error details.
Generally, these errors identify the start of the problematic line with a byte
offset. For uncompressed files you can usegcloud storagewith the--recursiveargument to access the relevant line.
For example, you run thebq loadcommandand receive an error:
The error in the output is similar to the following:
Waiting on bqjob_r5268069f5f49c9bf_0000018632e903d7_1 ... (0s)
Current status: DONE
BigQuery error in load operation: Error processing job
'myproject:bqjob_r5268069f5f49c9bf_0000018632e903d7_1': Error while reading
data, error message: Error detected while parsing row starting at position: 1405.
Error: Data between close quote character (") and field separator.
File: gs://my-bucket/mytable.csv
Failure details:-gs://my-bucket/mytable.csv: Error while reading data,
error message: Error detected while parsing row starting at
position:1405. Error: Data between close quote character (") and
field separator. File: gs://my-bucket/mytable.csv- Error while reading data, error message: CSV processing encountered
too many errors, giving up. Rows: 22; errors: 1; max bad: 0; error
percent: 0
Based on the preceding error, there's a format error in the file.
To view the file's content, run thegcloud storage catcommand:
Based on the output of the file, the problem is a misplaced quote in"April 15, "1865.
Compressed CSV files
Debugging parsing errors is more challenging for compressed CSV files, since
the reported byte offset refers to the location in theuncompressedfile.
The followinggcloud storage catcommandstreams the file from Cloud Storage, decompresses the file, identifies
the appropriate byte offset, and prints the line with the format error:
(Optional) The separator for fields in a CSV file. The separator can be
any ISO-8859-1 single-byte character. BigQuery
converts the string to ISO-8859-1 encoding, and uses the first byte of
the encoded string to split the data in its raw, binary state.
BigQuery also supports the escape sequence "\t" to specify a
tab separator. The default value is a comma (`,`).
(Optional) An integer indicating the number of header rows in the source
data.
Source column match
Source column match: Default, Position, Name
--source_column_match
None
(Preview) (Optional) This
controls the strategy used to match loaded columns to the schema.
Supported values include:
POSITION: matches by position. This option assumes that
the columns are ordered the same way as the schema.
NAME: matches by name. This option reads the header row
as column names and reorders columns to match the field names in the
schema. Column names are read from the last skipped row based on theskipLeadingRowsproperty.
If this value is unspecified, then the default is based on how the schema
is provided. If autodetect is enabled, then the default behavior is to
match columns by name. Otherwise, the default is to match columns by
position. This is done to keep the behavior backward-compatible.
(Optional) The maximum number of bad records that BigQuery
can ignore when running the job. If the number of bad records exceeds this
value, an invalid error is returned in the job result. The default value
is 0, which requires that all records are valid.
(Optional) Specifies a string that represents a null value in a CSV file.
For example, if you specify "\N", BigQuery interprets "\N"
as a null value when loading a CSV file. The default value is the empty
string. If you set this property to a custom value, BigQuery
throws an error if an empty string is present for all data types except
for STRING and BYTE. For STRING and BYTE columns, BigQuery
interprets the empty string as an empty value.
(Optional) Accept rows that are missing trailing optional columns. The
missing values are treated as nulls. If false, records with missing
trailing columns are treated as bad records, and if there are too many bad
records, an invalid error is returned in the job result. The default value
is false. Only applicable to CSV, ignored for other formats.
(Optional) Indicates if BigQuery should allow extra values
that are not represented in the table schema. If true, the extra values
are ignored. If false, records with extra columns are treated as bad
records, and if there are too many bad records, an invalid error is
returned in the job result. The default value is false. ThesourceFormatproperty determines what BigQuery
treats as an extra value:
CSV: Trailing columns
JSON: Named values that don't match any column names
Quote
Quote character: Double quote, Single quote, None, Custom
(Optional) The value that is used to quote data sections in a CSV file.
BigQuery converts the string to ISO-8859-1 encoding, and
then uses the first byte of the encoded string to split the data in its
raw, binary state. The default value is a double-quote ('"'). If your data
does not contain quoted sections, set the property value to an empty
string. If your data contains quoted newline characters, you must also set
theallowQuotedNewlinesproperty totrue. To
include the specific quote character within a quoted value, precede it
with an additional matching quote character. For example, if you want to
escape the default character ' " ', use ' "" '.
(Optional) The character encoding of the data. The supported values are
UTF-8, ISO-8859-1, UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE.
The default value is UTF-8. BigQuery
decodes the data after the raw, binary data has been split using the
values of thequoteandfieldDelimiterproperties.
ASCII control character
None
--preserve_ascii_control_characters
None
(Optional) If you want to allow ASCII 0 and other ASCII control
characters, then set--preserve_ascii_control_characterstotrueto your load jobs.
Null Markers
Null Markers
--null_markers
None
(Preview) (Optional)
A list of custom strings that represents a NULL value in CSV
data. This option cannot be used with--null_markeroption.
Time Zone
Time Zone
--time_zone
None
(Preview) (Optional)
Default time zone that will apply when parsing timestamp values
that have no specific time zone. Checkvalid time zone names.
If this value is not present, the timestamp values without specific time
zone is parsed using default time zone UTC.
Date Format
Date Format
--date_format
None
(Preview) (Optional)Format elementsthat define how the DATE values are formatted in the input files (for
example,MM/DD/YYYY). If this value is present, this format is
the only compatible DATE format.Schema autodetectionwill also decide DATE column type based on this format instead of the
existing format. If this value is not present, the DATE field is parsed
with thedefault formats.
Datetime Format
Datetime Format
--datetime_format
None
(Preview) (Optional)Format elementsthat define how the DATETIME values are formatted in the input files (for
example,MM/DD/YYYY HH24:MI:SS.FF3). If this value is present,
this format is the only compatible DATETIME format.Schema autodetectionwill also decide DATETIME column type based on this format instead of the
existing format. If this value is not present, the DATETIME field is parsed
with thedefault formats.
Time Format
Time Format
--time_format
None
(Preview) (Optional)Format elementsthat define how the TIME values are formatted in the input files (for
example,HH24:MI:SS.FF3). If this value is present, this format
is the only compatible TIME format.Schema autodetectionwill also decide TIME column type based on this format instead of the
existing format. If this value is not present, the TIME field is parsed
with thedefault formats.
Timestamp Format
Timestamp Format
--timestamp_format
None
(Preview) (Optional)Format elementsthat define how the TIMESTAMP values are formatted in the input files (for
example,MM/DD/YYYY HH24:MI:SS.FF3). If this value is present,
this format is the only compatible TIMESTAMP format.Schema autodetectionwill also decide TIMESTAMP column type based on this format instead of the
existing format. If this value is not present, the TIMESTAMP field is parsed
with thedefault formats.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-03 UTC."],[[["\u003cp\u003eLoading CSV data from Cloud Storage into BigQuery requires the dataset and storage bucket to be in the same regional or multi-regional location, with considerations for data format, compression, and schema definition.\u003c/p\u003e\n"],["\u003cp\u003eBigQuery offers several methods for loading CSV data, including the Google Cloud console, SQL's \u003ccode\u003eLOAD DATA\u003c/code\u003e statement, the \u003ccode\u003ebq\u003c/code\u003e command-line tool, and the BigQuery API, each with unique options for schema, error handling, and data transformation.\u003c/p\u003e\n"],["\u003cp\u003ePermissions for loading data include specific BigQuery table and Cloud Storage permissions, allowing users to create, update, and manage data in tables, as well as access data in Cloud Storage buckets.\u003c/p\u003e\n"],["\u003cp\u003eData can be loaded into new or existing BigQuery tables with options to append, overwrite, or only write if the table is empty, with specific write preferences available through the console, SQL, \u003ccode\u003ebq\u003c/code\u003e, and API methods.\u003c/p\u003e\n"],["\u003cp\u003eVarious formatting considerations are necessary for CSV data, including encoding, delimiters, date/time formats, handling null values, and escaping quotes, each manageable through specific configurations in the loading process.\u003c/p\u003e\n"]]],[],null,[]]