Specify nested and repeated columns in table schemas
This page describes how to define a table schema with nested and repeated columns in BigQuery. For an overview of table schemas, see Specifying a schema .
Define nested and repeated columns
To create a column with nested data, set the data type of the column to RECORD
in the schema. A RECORD
can be accessed as a STRUCT
type
in GoogleSQL. A STRUCT
is a container of ordered fields.
To create a column with repeated data, set the mode
of the column to REPEATED
in the schema.
A repeated field can be accessed as an ARRAY
type in
GoogleSQL.
A RECORD
column can have REPEATED
mode, which is represented as an array of STRUCT
types. Also, a field within a record can be repeated, which is
represented as a STRUCT
that contains an ARRAY
. An array cannot contain
another array directly. For more information, see Declaring an ARRAY
type
.
Limitations
Nested and repeated schemas are subject to the following limitations:
- A schema cannot contain more than 15 levels of nested
RECORD
types. - Columns of type
RECORD
can contain nestedRECORD
types, also called child records. The maximum nested depth limit is 15 levels. This limit is independent of whether theRECORD
s are scalar or array-based (repeated).
RECORD
type is incompatible with UNION
, INTERSECT
, EXCEPT DISTINCT
, and SELECT DISTINCT
.
Example schema
The following example shows sample nested and repeated data. This table contains information about people. It consists of the following fields:
-
id
-
first_name
-
last_name
-
dob
(date of birth) -
addresses
(a nested and repeated field)-
addresses.status
(current or previous) -
addresses.address
-
addresses.city
-
addresses.state
-
addresses.zip
-
addresses.numberOfYears
(years at the address)
-
The JSON data file would look like the following. Notice that the addresses
column contains an array of values (indicated by [ ]
). The multiple addresses
in the array are the repeated data. The multiple fields within each address are
the nested data.
{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]} {"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}
The schema for this table looks like the following:
[ { "name" : "id" , "type" : "STRING" , "mode" : "NULLABLE" }, { "name" : "first_name" , "type" : "STRING" , "mode" : "NULLABLE" }, { "name" : "last_name" , "type" : "STRING" , "mode" : "NULLABLE" }, { "name" : "dob" , "type" : "DATE" , "mode" : "NULLABLE" }, { "name" : "addresses" , "type" : "RECORD" , "mode" : "REPEATED" , "fields" : [ { "name" : "status" , "type" : "STRING" , "mode" : "NULLABLE" }, { "name" : "address" , "type" : "STRING" , "mode" : "NULLABLE" }, { "name" : "city" , "type" : "STRING" , "mode" : "NULLABLE" }, { "name" : "state" , "type" : "STRING" , "mode" : "NULLABLE" }, { "name" : "zip" , "type" : "STRING" , "mode" : "NULLABLE" }, { "name" : "numberOfYears" , "type" : "STRING" , "mode" : "NULLABLE" } ] } ]
Specifying the nested and repeated columns in the example
To create a new table with the previous nested and repeated columns, select one of the following options:
Console
Specify the nested and repeated addresses
column:
-
In the Google Cloud console, open the BigQuery page.
-
In the Explorerpanel, expand your project and select a dataset.
-
In the details panel, click Create table.
-
On the Create tablepage, specify the following details:
- For Source, in the Create table fromfield, select Empty table.
-
In the Destinationsection, specify the following fields:
- For Dataset, select the dataset in which you want to create the table.
- For Table, enter the name of the table that you want to create.
-
For Schema, click Add fieldand enter the following table schema:
- For Field name, enter
addresses
. - For Type, select RECORD.
- For Mode, choose REPEATED.
-
Specify the following fields for a nested field:
- In the Field namefield, enter
status
. - For Type, choose STRING.
- For Mode, leave the value set to NULLABLE.
-
Click Add fieldto add the following fields:
Field name Type Mode address
STRING
NULLABLE
city
STRING
NULLABLE
state
STRING
NULLABLE
zip
STRING
NULLABLE
numberOfYears
STRING
NULLABLE
Alternatively, click Edit as textand specify the schema as a JSON array.
- In the Field namefield, enter
- For Field name, enter
SQL
Use the CREATE TABLE
statement
.
Specify the schema using the column
option:
-
In the Google Cloud console, go to the BigQuerypage.
-
In the query editor, enter the following statement:
CREATE TABLE IF NOT EXISTS mydataset . mytable ( id STRING , first_name STRING , last_name STRING , dob DATE , addresses ARRAY < STRUCT < status STRING , address STRING , city STRING , state STRING , zip STRING , numberOfYears STRING >> ) OPTIONS ( description = 'Example name and addresses table' );
-
Click Run.
For more information about how to run queries, see Run an interactive query .
bq
To specify the nested and repeated addresses
column in a JSON schema file,
use a text editor to create a new file. Paste in the example schema
definition shown above.
After you create your JSON schema file, you can provide it through the bq command-line tool. For more information, see Using a JSON schema file .
Go
Before trying this sample, follow the Go setup instructions in the BigQuery quickstart using client libraries . For more information, see the BigQuery Go API reference documentation .
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries . For more information, see the BigQuery Java API reference documentation .
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .
Node.js
Before trying this sample, follow the Node.js setup instructions in the BigQuery quickstart using client libraries . For more information, see the BigQuery Node.js API reference documentation .
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .
Python
Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries . For more information, see the BigQuery Python API reference documentation .
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .
Insert data in nested columns in the example
Use the following queries to insert nested data records into tables that have RECORD
data type columns.
Example 1
INSERT INTO mydataset . mytable ( id , first_name , last_name , dob , addresses ) values ( "1" , "Johnny" , "Dawn" , "1969-01-22" , ARRAY < STRUCT < status STRING , address STRING , city STRING , state STRING , zip STRING , numberOfYears STRING >> [ ( "current" , "123 First Avenue" , "Seattle" , "WA" , "11111" , "1" ) ] )
Example 2
INSERT INTO mydataset . mytable ( id , first_name , last_name , dob , addresses ) values ( "1" , "Johnny" , "Dawn" , "1969-01-22" , [ ( "current" , "123 First Avenue" , "Seattle" , "WA" , "11111" , "1" ) ] )
Query nested and repeated columns
To select the value of an ARRAY
at a specific position, use an array subscript
operator
.
To access elements in a STRUCT
, use the dot operator
.
The following example selects the first name, last name, and first address
listed in the addresses
field:
SELECT first_name , last_name , addresses [ offset ( 0 ) ] . address FROM mydataset . mytable ;
The result is the following:
+------------+-----------+------------------+ | first_name | last_name | address | +------------+-----------+------------------+ | John | Doe | 123 First Avenue | | Jane | Doe | 789 Any Avenue | +------------+-----------+------------------+
To extract all elements of an ARRAY
, use the UNNEST
operator
with a CROSS JOIN
.
The following example selects the first name, last name, address, and state for
all addresses not located in New York:
SELECT first_name , last_name , a . address , a . state FROM mydataset . mytable CROSS JOIN UNNEST ( addresses ) AS a WHERE a . state != 'NY' ;
The result is the following:
+------------+-----------+------------------+-------+ | first_name | last_name | address | state | +------------+-----------+------------------+-------+ | John | Doe | 123 First Avenue | WA | | John | Doe | 456 Main Street | OR | | Jane | Doe | 321 Main Street | NJ | +------------+-----------+------------------+-------+
Modify nested and repeated columns
After you add a nested column or a nested and repeated column to a table's schema definition, you can modify the column as you would any other type of column. BigQuery natively supports several schema changes such as adding a new nested field to a record or relaxing a nested field's mode. For more information, see Modifying table schemas .
When to use nested and repeated columns
BigQuery performs best when your data is denormalized. Rather than preserving a relational schema such as a star or snowflake schema, denormalize your data and take advantage of nested and repeated columns. Nested and repeated columns can maintain relationships without the performance impact of preserving a relational (normalized) schema.
For example, a relational database used to track library books would likely keep
all author information in a separate table. A key such as author_id
would be
used to link the book to the authors.
In BigQuery, you can preserve the relationship between book and author without creating a separate author table. Instead, you create an author column, and you nest fields within it such as the author's first name, last name, date of birth, and so on. If a book has multiple authors, you can make the nested author column repeated.
Suppose you have the following table mydataset.books
:
+------------------+------------+-----------+ | title | author_ids | num_pages | +------------------+------------+-----------+ | Example Book One | [123, 789] | 487 | | Example Book Two | [456] | 89 | +------------------+------------+-----------+
You also have the following table, mydataset.authors
, with complete
information for each author ID:
+-----------+-------------+---------------+ | author_id | author_name | date_of_birth | +-----------+-------------+---------------+ | 123 | Alex | 01-01-1960 | | 456 | Rosario | 01-01-1970 | | 789 | Kim | 01-01-1980 | +-----------+-------------+---------------+
If the tables are large, it might be resource intensive to join them regularly. Depending on your situation, it might be beneficial to create a single table that contains all the information:
CREATE TABLE mydataset . denormalized_books ( title STRING , authors ARRAY<STRUCT<id INT64 , name STRING , date_of_birth STRING >> , num_pages INT64 ) AS ( SELECT title , ARRAY_AGG ( STRUCT ( author_id , author_name , date_of_birth )) AS authors , ANY_VALUE ( num_pages ) FROM mydataset . books , UNNEST ( author_ids ) id JOIN mydataset . authors ON id = author_id GROUP BY title );
The resulting table looks like the following:
+------------------+-------------------------------+-----------+ | title | authors | num_pages | +------------------+-------------------------------+-----------+ | Example Book One | [{123, Alex, 01-01-1960}, | 487 | | | {789, Kim, 01-01-1980}] | | | Example Book Two | [{456, Rosario, 01-01-1970}] | 89 | +------------------+-------------------------------+-----------+
BigQuery supports loading nested and repeated data from source formats that support object-based schemas, such as JSON files, Avro files, Firestore export files, and Datastore export files.
Deduplicate duplicate records in a table
The following query uses the row_number()
function to identify duplicate records that have the same values for last_name
and first_name
in the examples used and sorts them by dob
:
CREATE OR REPLACE TABLE mydataset . mytable AS ( SELECT * except ( row_num ) FROM ( SELECT * , row_number () over ( partition by last_name , first_name order by dob ) row_num FROM mydataset . mytable ) temp_table WHERE row_num = 1 )
Table security
To control access to tables in BigQuery, see Control access to resources with IAM .
What's next
- To insert and update rows with nested and repeated columns, see Data manipulation language syntax .