Stay organized with collectionsSave and categorize content based on your preferences.
Using schema auto-detection
Schema auto-detection
Schema auto-detection enables BigQuery to infer the schema for
CSV, JSON, or Google Sheets data. Schema auto-detection is available when youloaddata into BigQuery and when
you query anexternal data source.
When auto-detection is enabled, BigQuery infers the data type for
each column. BigQuery selects a random file in the data source
and scans up to the first 500 rows of data to use as a representative sample.
BigQuery then examines each field and attempts to assign a data
type to that field based on the values in the sample. If all of the rows in a
column are empty, auto-detection will default toSTRINGdata type for the column.
If you don't enable schema auto-detection for CSV, JSON, or Google Sheets
data, then you must provide the schema manually when creating the table.
You don't need to enable schema auto-detection for Avro, Parquet, ORC, Firestore
export, or Datastore export files. These file formats are self-describing, so
BigQuery automatically infers the table schema from the source
data. For Parquet, Avro, and Orc files, you can optionally provide an explicit
schema to override the inferred schema.
You can see the detected schema for a table in the following ways:
To enable schema auto-detection when loading data, use one of these approaches:
In the Google Cloud console, in theSchemasection, forAuto detect,
check theSchema and input parametersoption.
In the bq command-line tool, use thebq loadcommand with the--autodetectparameter.
When schema auto-detection is enabled, BigQuery makes a
best-effort attempt to automatically infer the schema for CSV and JSON files.
The auto-detection logic infers the schema field types by reading up to the
first 500 rows of data. Leading lines are skipped if the--skip_leading_rowsflag is present. The field types are based on the rows having the most fields.
Therefore, auto-detection should work as expected as long as there is at least
one row of data that has values in every column/field.
Schema auto-detection is not used with Avro files, Parquet files, ORC files,
Firestore export files, or Datastore export files. When you
load these files into BigQuery, the table schema is automatically
retrieved from the self-describing source data.
To use schema auto-detection when you load JSON or CSV data:
Console
In the Google Cloud console, go to the BigQuery page.
In theExplorerpanel, expand your project and select a dataset.
Expand themore_vertActionsoption and clickOpen.
In the details panel, clickCreate tableadd_box.
On theCreate tablepage, in theSourcesection:
ForCreate table from, select your desired source type.
In the source field, browse for the File/Cloud Storage bucket, or
enter theCloud Storage URI. Note that you cannot
include multiple URIs in the Google Cloud console, butwildcardsare supported. The Cloud Storage bucket must be in the same
location as the dataset that contains the table you're creating.
ForFile format, selectCSVorJSON.
On theCreate tablepage, in theDestinationsection:
ForDataset name, choose the appropriate dataset.
In theTable namefield, enter the name of the table you're
creating.
Verify thatTable typeis set toNative table.
ClickCreate table.
bq
Issue thebq loadcommand with the--autodetectparameter.
(Optional) Supply the--locationflag and set the value to yourlocation.
The following command loads a file using schema auto-detect:
LOCATION: the name of your location. The--locationflag is optional. For example, if you are using
BigQuery in the Tokyo region, set the flag's value toasia-northeast1. You can set a default value for the location by using
the.bigqueryrc file.
FORMAT: eitherNEWLINE_DELIMITED_JSONorCSV.
DATASET: the dataset that contains the table
into which you're loading data.
TABLE: the name of the table into which you're
loading data.
PATH_TO_SOURCE: is the location of the CSV or
JSON file.
Examples:
Enter the following command to loadmyfile.csvfrom your local
machine into a table namedmytablethat is stored in a dataset namedmydataset.
Create aloadjob that points to the source data. For information about
creating jobs, seeRunning BigQuery jobs programmatically.
Specify your location in thelocationproperty in thejobReferencesection.
Specify the data format by setting thesourceFormatproperty. To use
schema autodetection, this value must be set toNEWLINE_DELIMITED_JSONorCSV.
Use theautodetectproperty to set schema autodetection totrue.
import("context""fmt""cloud.google.com/go/bigquery")// importJSONAutodetectSchema demonstrates loading data from newline-delimited JSON data in Cloud Storage// and using schema autodetection to identify the available columns.funcimportJSONAutodetectSchema(projectID,datasetID,tableIDstring)error{// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %v",err)}deferclient.Close()gcsRef:=bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states.json")gcsRef.SourceFormat=bigquery.JSONgcsRef.AutoDetect=trueloader:=client.Dataset(datasetID).Table(tableID).LoaderFrom(gcsRef)loader.WriteDisposition=bigquery.WriteEmptyjob,err:=loader.Run(ctx)iferr!=nil{returnerr}status,err:=job.Wait(ctx)iferr!=nil{returnerr}ifstatus.Err()!=nil{returnfmt.Errorf("job completed with error: %v",status.Err())}returnnil}
importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.FormatOptions;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.LoadJobConfiguration;importcom.google.cloud.bigquery.TableId;// Sample to load JSON data with autodetect schema from Cloud Storage into a new BigQuery tablepublicclassLoadJsonFromGCSAutodetect{publicstaticvoidrunLoadJsonFromGCSAutodetect(){// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";StringsourceUri="gs://cloud-samples-data/bigquery/us-states/us-states.json";loadJsonFromGCSAutodetect(datasetName,tableName,sourceUri);}publicstaticvoidloadJsonFromGCSAutodetect(StringdatasetName,StringtableName,StringsourceUri){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();TableIdtableId=TableId.of(datasetName,tableName);LoadJobConfigurationloadConfig=LoadJobConfiguration.newBuilder(tableId,sourceUri).setFormatOptions(FormatOptions.json()).setAutodetect(true).build();// Load data from a GCS JSON file into the tableJobjob=bigquery.create(JobInfo.of(loadConfig));// Blocks until this load table job completes its execution, either failing or succeeding.job=job.waitFor();if(job.isDone()){System.out.println("Json Autodetect from GCS successfully loaded in a table");}else{System.out.println("BigQuery was unable to load into the table due to an error:"+job.getStatus().getError());}}catch(BigQueryException|InterruptedExceptione){System.out.println("Column not added during load append \n"+e.toString());}}}
importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.CsvOptions;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.LoadJobConfiguration;importcom.google.cloud.bigquery.TableId;// Sample to load CSV data with autodetect schema from Cloud Storage into a new BigQuery tablepublicclassLoadCsvFromGcsAutodetect{publicstaticvoidmain(String[]args){// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";StringsourceUri="gs://cloud-samples-data/bigquery/us-states/us-states.csv";loadCsvFromGcsAutodetect(datasetName,tableName,sourceUri);}publicstaticvoidloadCsvFromGcsAutodetect(StringdatasetName,StringtableName,StringsourceUri){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();TableIdtableId=TableId.of(datasetName,tableName);// Skip header row in the file.CsvOptionscsvOptions=CsvOptions.newBuilder().setSkipLeadingRows(1).build();LoadJobConfigurationloadConfig=LoadJobConfiguration.newBuilder(tableId,sourceUri).setFormatOptions(csvOptions).setAutodetect(true).build();// Load data from a GCS CSV file into the tableJobjob=bigquery.create(JobInfo.of(loadConfig));// Blocks until this load table job completes its execution, either failing or succeeding.job=job.waitFor();if(job.isDone()&&job.getStatus().getError()==null){System.out.println("CSV Autodetect from GCS successfully loaded in a table");}else{System.out.println("BigQuery was unable to load into the table due to an error:"+job.getStatus().getError());}}catch(BigQueryException|InterruptedExceptione){System.out.println("Column not added during load append \n"+e.toString());}}}
// Import the Google Cloud client librariesconst{BigQuery}=require('@google-cloud/bigquery');const{Storage}=require('@google-cloud/storage');/*** TODO(developer): Uncomment the following lines before running the sample.*/// const datasetId = "my_dataset";// const tableId = "my_table";/*** This sample loads the JSON file at* https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.json** TODO(developer): Replace the following lines with the path to your file.*/constbucketName='cloud-samples-data';constfilename='bigquery/us-states/us-states.json';asyncfunctionloadJSONFromGCSAutodetect(){// Imports a GCS file into a table with autodetected schema.// Instantiate clientsconstbigquery=newBigQuery();conststorage=newStorage();// Configure the load job. For full list of options, see:// https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoadconstmetadata={sourceFormat:'NEWLINE_DELIMITED_JSON',autodetect:true,location:'US',};// Load data from a Google Cloud Storage file into the tableconst[job]=awaitbigquery.dataset(datasetId).table(tableId).load(storage.bucket(bucketName).file(filename),metadata);// load() waits for the job to finishconsole.log(`Job${job.id}completed.`);// Check the job's status for errorsconsterrors=job.status.errors;if(errors&&errors.length>0){throwerrors;}}loadJSONFromGCSAutodetect();
use Google\Cloud\BigQuery\BigQueryClient;/*** Imports data to the given table from json file present in GCS by auto* detecting options and schema.** @param string $projectId The project Id of your Google Cloud Project.* @param string $datasetId The BigQuery dataset ID.* @param string $tableId The BigQuery table ID.*/function import_from_storage_json_autodetect(string $projectId,string $datasetId,string $tableId = 'us_states'): void {// instantiate the bigquery table service$bigQuery = new BigQueryClient(['projectId' => $projectId,]);$dataset = $bigQuery->dataset($datasetId);$table = $dataset->table($tableId);// create the import job$gcsUri = 'gs://cloud-samples-data/bigquery/us-states/us-states.json';$loadConfig = $table->loadFromStorage($gcsUri)->autodetect(true)->sourceFormat('NEWLINE_DELIMITED_JSON');$job = $table->runJob($loadConfig);// check if the job is complete$job->reload();if (!$job->isComplete()) {throw new \Exception('Job has not yet completed', 500);}// check if the job has errorsif (isset($job->info()['status']['errorResult'])) {$error = $job->info()['status']['errorResult']['message'];printf('Error running job: %s' . PHP_EOL, $error);} else {print('Data imported successfully' . PHP_EOL);}}
fromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_name# Set the encryption key to use for the destination.# TODO: Replace this key with a key you have created in KMS.# kms_key_name = "projects/{}/locations/{}/keyRings/{}/cryptoKeys/{}".format(# "cloud-samples-tests", "us", "test", "test"# )job_config=bigquery.LoadJobConfig(autodetect=True,source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON)uri="gs://cloud-samples-data/bigquery/us-states/us-states.json"load_job=client.load_table_from_uri(uri,table_id,job_config=job_config)# Make an API request.load_job.result()# Waits for the job to complete.destination_table=client.get_table(table_id)print("Loaded{}rows.".format(destination_table.num_rows))
require"google/cloud/bigquery"defload_table_gcs_json_autodetectdataset_id="your_dataset_id"bigquery=Google::Cloud::Bigquery.newdataset=bigquery.datasetdataset_idgcs_uri="gs://cloud-samples-data/bigquery/us-states/us-states.json"table_id="us_states"load_job=dataset.load_jobtable_id,gcs_uri,format:"json",autodetect:trueputs"Starting job#{load_job.job_id}"load_job.wait_until_done!# Waits for table load to complete.puts"Job finished."table=dataset.tabletable_idputs"Loaded#{table.rows_count}rows to table#{table.id}"end
Schema auto-detection for external data sources
Schema auto-detection can be used with CSV, JSON, and Google Sheets external
data sources. When schema auto-detection is enabled, BigQuery
makes a best-effort attempt to automatically infer the schema from the source
data. If you don't enable schema auto-detection for these sources, then you
must provide an explicit schema.
You don't need to enable schema auto-detection when you query external Avro,
Parquet, ORC, Firestore export, or Datastore export files. These file formats
are self-describing, so BigQuery automatically infers the table
schema from the source data. For Parquet, Avro, and Orc files, you can
optionally provide an explicit schema to override the inferred schema.
Using the Google Cloud console, you can enable schema auto-detection by
checking theSchema and input parametersoption forAuto detect.
Using the bq command-line tool, you can enable schema auto-detection when you
create atable definition filefor CSV,
JSON, or Google Sheets data. When using the bq tool to create a
table definition file, pass the--autodetectflag to themkdefcommand to
enable schema auto-detection, or pass the--noautodetectflag to disable
auto-detection.
When you use the--autodetectflag, theautodetectsetting is set totruein the table definition file. When you use the--noautodetectflag, theautodetectsetting is set tofalse. If you do not provide a schema
definition for the external data source when you create a table definition, and
you do not use the--noautodetector--autodetectflag, theautodetectsetting defaults totrue.
When you create a table definition file by using the API, set the value of theautodetectproperty totrueorfalse. Settingautodetecttotrueenables auto-detection. Settingautodetecttofalsedisables autodetect.
Auto-detection details
In addition to detecting schema details, auto-detection recognizes the
following:
Compression
BigQuery recognizes gzip-compatible file compression when opening
a file.
Date and time values
BigQuery detects date and time values based on the formatting of
the source data.
Values inDATEcolumns must be in the following format:YYYY-MM-DD.
Values inTIMEcolumns must be in the following format:HH:MM:SS[.SSSSSS](the fractional-second component is optional).
ForTIMESTAMPcolumns, BigQuery detects a wide array of
timestamp formats, including, but not limited to:
YYYY-MM-DD HH:MM
YYYY-MM-DD HH:MM:SS
YYYY-MM-DD HH:MM:SS.SSSSSS
YYYY/MM/DD HH:MM
A timestamp can also contain a UTC offset or the UTC zone designator ('Z').
Here are some examples of values that BigQuery will automatically
detect as timestamp values:
2018-08-19 12:11
2018-08-19 12:11:35.22
2018/08/19 12:11
2018-08-19 07:11:35.220 -05:00
If auto-detection isn't enabled, and your value is in a format not present in the
preceding examples, then BigQuery can only load the column as aSTRINGdata type. You can enable auto-detection to have BigQuery
recognize these columns as timestamps. For example, BigQuery will
only load2025-06-16T16:55:22Zas a timestamp if you enable auto-detection.
Alternatively, you can preprocess the source data
before loading it. For example, if you are exporting CSV data from a
spreadsheet, set the date format to match one of the examples shown here.
You can also transform the data after loading it into
BigQuery.
Schema auto-detection for CSV data
CSV delimiter
BigQuery detects the following delimiters:
comma ( , )
pipe ( | )
tab ( \t )
CSV header
BigQuery infers headers by comparing the first row of the file
with other rows in the file. If the first line contains only strings, and the
other lines contain other data types, BigQuery assumes that the
first row is a header row. BigQuery assigns column names based on the field names in the header row. The names might be modified to meet thenaming rulesfor columns in BigQuery. For example, spaces will be replaced with underscores.
Otherwise, BigQuery assumes the first row is a data row, and
assigns generic column names such asstring_field_1. Note that after a table
is created, the column names cannot be updated in the schema, although you canchange the names
manuallyafter the table is created. Another option is to provide an explicit schema
instead of using autodetect.
You might have a CSV file with a header row, where all of the data fields are
strings. In that case, BigQuery will not automatically detect that
the first row is a header. Use the--skip_leading_rowsoption to skip the
header row. Otherwise, the header will be imported as data. Also consider
providing an explicit schema in this case, so that you can assign column names.
CSV quoted new lines
BigQuery detects quoted new line characters within a CSV field
and does not interpret the quoted new line character as a row boundary.
Schema auto-detection for JSON data
JSON nested and repeated fields
BigQuery infers nested and repeated fields in JSON files. If a
field value is a JSON object, then BigQuery loads the column as aRECORDtype. If a field value is an array, then BigQuery loads
the column as a repeated column. For an example of JSON data with nested and
repeated data, seeLoading nested and repeated JSON data.
String conversion
If you enable schema auto-detection, then BigQuery converts
strings into Boolean, numeric, or date/time types when possible. For example,
using the following JSON data, schema auto-detection converts theidfield
to anINTEGERcolumn:
For Sheets, BigQuery auto-detects whether the
first row is a header row, similar to auto-detection for CSV files. If the first
line is identified as a header, BigQuery assigns column names
based on the field names in the header row and skips the row. The names might be
modified to meet thenaming rulesfor
columns in
BigQuery. For example, spaces will be replaced with underscores.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eSchema auto-detection in BigQuery automatically infers data types and structures for CSV, JSON, and Google Sheets data when loading or querying external data sources, simplifying the process of defining schemas.\u003c/p\u003e\n"],["\u003cp\u003eBigQuery examines up to the first 500 rows of a data source to determine schema, defaulting to \u003ccode\u003eSTRING\u003c/code\u003e type for entirely empty columns.\u003c/p\u003e\n"],["\u003cp\u003eSchema auto-detection is unnecessary for Avro, Parquet, ORC, Firestore export, and Datastore export files, as these formats are self-describing and automatically provide schema information.\u003c/p\u003e\n"],["\u003cp\u003eWhen using schema auto-detection, BigQuery can recognize various data types, including date and time formats, and adjust field names for GoogleSQL syntax compatibility, as well as infer nested and repeated fields in JSON data.\u003c/p\u003e\n"],["\u003cp\u003eSchema auto-detection can be enabled in the Google Cloud console or through the bq command-line tool using the \u003ccode\u003e--autodetect\u003c/code\u003e flag, and it offers flexibility in handling CSV delimiters and header rows, but for more control, users can provide an explicit schema.\u003c/p\u003e\n"]]],[],null,[]]