Working with JSON data in GoogleSQL

This document describes how to create a table with a JSON column, insert JSON data into a BigQuery table, and query JSON data.

BigQuery natively supports JSON data using the JSON data type.

JSON is a widely used format that allows for semi-structured data, because it does not require a schema. Applications can use a "schema-on-read" approach, where the application ingests the data and then queries based on assumptions about the schema of that data. This approach differs from the STRUCT type in BigQuery, which requires a fixed schema that is enforced for all values stored in a column of STRUCT type.

By using the JSON data type, you can load semi-structured JSON into BigQuery without providing a schema for the JSON data upfront. This lets you store and query data that doesn't always adhere to fixed schemas and data types. By ingesting JSON data as a JSON data type, BigQuery can encode and process each JSON field individually. You can then query the values of fields and array elements within the JSON data by using the field access operator, which makes JSON queries intuitive and cost efficient.

Limitations

  • If you use a batch load job to ingest JSON data into a table, the source data must be in CSV, Avro, or JSON format. Other batch load formats are not supported.
  • The JSON data type has a nesting limit of 500.
  • You can't use legacy SQL to query a table that contains JSON types.
  • Row-level access policies cannot be applied on JSON columns.

To learn about the properties of the JSON data type, see JSON type .

Create a table with a JSON column

You can create an empty table with a JSON column by using SQL or by using the bq command-line tool.

SQL

Use the CREATE TABLE statement and declare a column with the JSON type.

  1. In the Google Cloud console, go to the BigQuerypage.

    Go to BigQuery

  2. In the query editor, enter the following statement:

     CREATE 
      
     TABLE 
      
     mydataset 
     . 
     table1 
     ( 
      
     id 
      
     INT64 
     , 
      
     cart 
      
     JSON 
     ); 
    
  3. Click Run.

For more information about how to run queries, see Run an interactive query .

bq

Use the bq mk command and provide a table schema with a JSON data type.

bq  
mk  
--table  
mydataset.table1  
id:INT64,cart:JSON

You can't partition or cluster a table on JSON columns, because the equality and comparison operators are not defined on the JSON type.

Create JSON values

You can create JSON values in the following ways:

Create a JSON value

The following example inserts JSON values into a table:

 INSERT 
  
 INTO 
  
 mydataset 
 . 
 table1 
  
 VALUES 
 ( 
 1 
 , 
  
 JSON 
  
 '{"name": "Alice", "age": 30}' 
 ), 
 ( 
 2 
 , 
  
 JSON_ARRAY 
 ( 
 10 
 , 
  
 [ 
 'foo' 
 , 
  
 'bar' 
 ], 
  
 [ 
 20 
 , 
  
 30 
 ])), 
 ( 
 3 
 , 
  
 JSON_OBJECT 
 ( 
 'foo' 
 , 
  
 10 
 , 
  
 'bar' 
 , 
  
 [ 
 'a' 
 , 
  
 'b' 
 ])); 

Convert a STRING type to JSON type

The following example converts a JSON-formatted STRING value by using the PARSE_JSON function. The example converts a column from an existing table to a JSON type and stores the results to a new table.

 CREATE 
  
 OR 
  
  REPLACE 
 
  
 TABLE 
  
 mydataset 
 . 
 table_new 
 AS 
  
 ( 
  
 SELECT 
  
 id 
 , 
  
 SAFE 
 . 
 PARSE_JSON 
 ( 
 cart 
 ) 
  
 AS 
  
 cart_json 
  
 FROM 
  
 mydataset 
 . 
 old_table 
 ); 

The SAFE prefix used in this example ensures that any conversion errors are returned as NULL values.

Convert schematized data to JSON

The following example converts key-value pairs to JSON using the JSON_OBJECT function.

 WITH 
  
 Fruits 
  
 AS 
  
 ( 
 SELECT 
  
 0 
  
 AS 
  
 id 
 , 
  
 'color' 
  
 AS 
  
 k 
 , 
  
 'Red' 
  
 AS 
  
 v 
  
 UNION 
  
 ALL 
 SELECT 
  
 0 
 , 
  
 'fruit' 
 , 
  
 'apple' 
  
 UNION 
  
 ALL 
 SELECT 
  
 1 
 , 
  
 'fruit' 
 , 
 'banana' 
  
 UNION 
  
 ALL 
 SELECT 
  
 1 
 , 
  
 'ripe' 
 , 
  
 'true' 
 ) 
 SELECT 
  
 JSON_OBJECT 
 ( 
 ARRAY_AGG 
 ( 
 k 
 ), 
  
 ARRAY_AGG 
 ( 
 v 
 )) 
  
 AS 
  
 json_data 
 FROM 
  
 Fruits 
 GROUP 
  
 BY 
  
 id 

The result is the following:

+----------------------------------+
| json_data                        |
+----------------------------------+
| {"color":"Red","fruit":"apple"}  |
| {"fruit":"banana","ripe":"true"} |
+----------------------------------+

Convert a SQL type to JSON type

The following example converts a SQL STRUCT value to a JSON value by using the TO_JSON function:

 SELECT 
  
 TO_JSON 
 ( 
  STRUCT 
 
 ( 
 1 
  
 AS 
  
 id 
 , 
  
 [ 
 10 
 , 
 20 
 ] 
  
 AS 
  
 coordinates 
 )) 
  
 AS 
  
 pt 
 ; 

The result is the following:

+--------------------------------+
| pt                             |
+--------------------------------+
| {"coordinates":[10,20],"id":1} |
+--------------------------------+

Ingest JSON data

You can ingest JSON data into a BigQuery table in the following ways:

Load from CSV files

The following example assumes that you have a CSV file named file1.csv that contains the following records:

1,20
2,"""This is a string"""
3,"{""id"": 10, ""name"": ""Alice""}"

Note that the second column contains JSON data that is encoded as a string. This involves correctly escaping the quotes for the CSV format. In CSV format, quotes are escaped by using the two character sequence "" .

To load this file using the bq command-line tool, use the bq load command:

  bq 
  
 load 
  
 -- 
 source_format 
 = 
 CSV 
  
 mydataset 
 . 
 table1 
  
 file1 
 . 
 csv 
  
 id 
 : 
 INTEGER 
 , 
 json_data 
 : 
 JSON 
 bq 
  
 show 
  
 mydataset 
 . 
 table1 
 Last 
  
 modified 
  
 Schema 
  
 Total 
  
 Rows 
  
 Total 
  
 Bytes 
 



22 Dec 22 : 10 : 32 |- id : integer 3 63 |- json_data : json

Load from newline delimited JSON files

The following example assumes that you have a file named file1.jsonl that contains the following records:

{"id": 1, "json_data": 20}
{"id": 2, "json_data": "This is a string"}
{"id": 3, "json_data": {"id": 10, "name": "Alice"}}

To load this file using the bq command-line tool, use the bq load command:

  bq 
  
 load 
  
 -- 
 source_format 
 = 
 NEWLINE_DELIMITED_JSON 
  
 mydataset 
 . 
 table1 
  
 file1 
 . 
 jsonl 
  
 id 
 : 
 INTEGER 
 , 
 json_data 
 : 
 JSON 
 bq 
  
 show 
  
 mydataset 
 . 
 table1 
 Last 
  
 modified 
  
 Schema 
  
 Total 
  
 Rows 
  
 Total 
  
 Bytes 
 



22 Dec 22 : 10 : 32 |- id : integer 3 63 |- json_data : json

Use the Storage Write API

You can use the Storage Write API to ingest JSON data. The following example uses the Storage Write API Python client .

Define a protocol buffer to hold the serialized streaming data. The JSON data is encoded as a string. In the following example, the json_col field holds JSON data.

message SampleData {
  optional string string_col = 1;
  optional int64 int64_col = 2;
  optional string json_col = 3;
}

Format the JSON data for each row as a STRING value:

row.json_col = '{"a": 10, "b": "bar"}'
row.json_col = '"This is a string"' # The double-quoted string is the JSON value.
row.json_col = '10'

Append the rows to the write stream as shown in the code example . The client library handles serialization to protocol buffer format.

Use the legacy streaming API

The following example loads JSON data from a local file and streams it to BigQuery by using the legacy streaming API .

  from 
 google.cloud 
 import 
 bigquery 
 import 
 json 
 # TODO(developer): Replace these variables before running the sample. 
 project_id 
 = 
' MY_PROJECT_ID 
' table_id 
 = 
' MY_TABLE_ID 
' client 
 = 
 bigquery 
 . 
 Client 
 ( 
 project 
 = 
 project_id 
 ) 
 table_obj 
 = 
 client 
 . 
 get_table 
 ( 
 table_id 
 ) 
 # The column json_data is represented as a string. 
 rows_to_insert 
 = 
 [ 
 { 
" id 
" : 
 1 
 , 
" json_data 
" : 
 json 
 . 
 dumps 
 ( 
 20 
 )}, 
 { 
" id 
" : 
 2 
 , 
" json_data 
" : 
 json 
 . 
 dumps 
 ( 
" This 
 is 
 a 
 string 
" )}, 
 { 
" id 
" : 
 3 
 , 
" json_data 
" : 
 json 
 . 
 dumps 
 ({ 
" id 
" : 
 10 
 , 
" name 
" : 
" Alice 
" })} 
 ] 
 # Throw errors if encountered. 
 # https://cloud.google.com/python/docs/reference/bigquery/latest/google.cloud.bigquery.client.Client#google_cloud_bigquery_client_Client_insert_rows 
 errors 
 = 
 client 
 . 
 insert_rows 
 ( 
 table 
 = 
 table_obj 
 , 
 rows 
 = 
 rows_to_insert 
 ) 
 if 
 errors 
 == 
 []: 
 print 
 ( 
" New 
 rows 
 have 
 been 
 added 
 . 
" ) 
 else 
 : 
 print 
 ( 
" Encountered 
 errors 
 while 
 inserting 
 rows 
 : 
 {} 
" . 
 format 
 ( 
 errors 
 )) 
 

For more information, see Streaming data into BigQuery .

Query JSON data

This section describes how to use GoogleSQL to extract values from JSON. JSON is case-sensitive and supports UTF-8 in both fields and values.

The examples in this section use the following table:

 CREATE 
  
 OR 
  
  REPLACE 
 
  
 TABLE 
  
 mydataset 
 . 
 table1 
 ( 
 id 
  
 INT64 
 , 
  
 cart 
  
 JSON 
 ); 
 INSERT 
  
 INTO 
  
 mydataset 
 . 
 table1 
  
 VALUES 
 ( 
 1 
 , 
  
 JSON 
  
  """{ 
 " 
 name 
 ": " 
 Alice 
 ", 
 " 
 items 
 ": [ 
 {" 
 product 
 ": " 
 book 
 ", " 
 price 
 ": 10}, 
 {" 
 product 
 ": " 
 food 
 ", " 
 price 
 ": 5} 
 ] 
 }""" 
 
 ), 
 ( 
 2 
 , 
  
 JSON 
  
  """{ 
 " 
 name 
 ": " 
 Bob 
 ", 
 " 
 items 
 ": [ 
 {" 
 product 
 ": " 
 pen 
 ", " 
 price 
 ": 20} 
 ] 
 }""" 
 
 ); 

Extract values as JSON

Given a JSON type in BigQuery, you can access the fields in a JSON expression by using the field access operator . The following example returns the name field of the cart column.

 SELECT 
  
 cart 
 . 
 name 
 FROM 
  
 mydataset 
 . 
 table1 
 ; 
+---------+
|  name   |
+---------+
| "Alice" |
| "Bob"   |
+---------+

To access an array element, use the JSON subscript operator . The following example returns the first element of the items array:

 SELECT 
  
 cart 
 . 
 items 
 [ 
 0 
 ] 
  
 AS 
  
 first_item 
 FROM 
  
 mydataset 
 . 
 table1 
+-------------------------------+
|          first_item           |
+-------------------------------+
| {"price":10,"product":"book"} |
| {"price":20,"product":"pen"}  |
+-------------------------------+

You can also use the JSON subscript operator to reference the members of a JSON object by name:

 SELECT 
  
 cart 
 [ 
 'name' 
 ] 
 FROM 
  
 mydataset 
 . 
 table1 
 ; 
+---------+
|  name   |
+---------+
| "Alice" |
| "Bob"   |
+---------+

For subscript operations, the expression inside the brackets can be any arbitrary string or integer expression, including non-constant expressions:

 DECLARE 
  
 int_val 
  
 INT64 
  
 DEFAULT 
  
 0 
 ; 
 SELECT 
  
 cart 
 [ 
 CONCAT 
 ( 
 'it' 
 , 
 'ems' 
 )][ 
 int_val 
  
 + 
  
 1 
 ]. 
 product 
  
 AS 
  
 item 
 FROM 
  
 mydataset 
 . 
 table1 
 ; 
+--------+
|  item  |
+--------+
| "food" |
| NULL   |
+--------+

Field access and subscript operators both return JSON types, so you can chain expressions that use them or pass the result to other functions that take JSON types.

These operators are syntactic sugar for the JSON_QUERY function. For example, the expression cart.name is equivalent to JSON_QUERY(cart, "$.name") .

If a member with the specified name is not found in the JSON object, or if the JSON array doesn't have an element with the specified position, then these operators return SQL NULL .

 SELECT 
  
 cart 
 . 
 address 
  
 AS 
  
 address 
 , 
  
 cart 
 . 
 items 
 [ 
 1 
 ]. 
 price 
  
 AS 
  
 item1_price 
 FROM 
  
 mydataset 
 . 
 table1 
 ; 
+---------+-------------+
| address | item1_price |
+---------+-------------+
| NULL    | NULL        |
| NULL    | 5           |
+---------+-------------+

The equality and comparison operators are not defined on the JSON data type. Therefore, you can't use JSON values directly in clauses like GROUP BY or ORDER BY . Instead, use the JSON_VALUE function to extract field values as SQL strings, as described in the next section.

Extract values as strings

The JSON_VALUE function extracts a scalar value and returns it as a SQL string. It returns SQL NULL if cart.name doesn't point to a scalar value in the JSON.

 SELECT 
  
 JSON_VALUE 
 ( 
 cart 
 . 
 name 
 ) 
  
 AS 
  
 name 
 FROM 
  
 mydataset 
 . 
 table1 
 ; 
+-------+
| name  |
+-------+
| Alice |
+-------+

You can use the JSON_VALUE function in contexts that require equality or comparison, such as WHERE clauses and GROUP BY clauses. The following example shows a WHERE clause that filters against a JSON value:

 SELECT 
  
 cart 
 . 
 items 
 [ 
 0 
 ] 
  
 AS 
  
 first_item 
 FROM 
  
 mydataset 
 . 
 table1 
 WHERE 
  
 JSON_VALUE 
 ( 
 cart 
 . 
 name 
 ) 
  
 = 
  
 'Alice' 
 ; 
+-------------------------------+
| first_item                    |
+-------------------------------+
| {"price":10,"product":"book"} |
+-------------------------------+

Alternatively, you can use the STRING function which extracts a JSON string and returns that value as a SQL STRING . For example:

 SELECT 
  
 STRING 
 ( 
 JSON 
  
 '"purple"' 
 ) 
  
 AS 
  
 color 
 ; 
+--------+
| color  |
+--------+
| purple |
+--------+

In addition to STRING , you might have to extract JSON values and return them as another SQL data type. The following value extraction functions are available:

To obtain the type of the JSON value, you can use the JSON_TYPE function.

Flexibly convert JSON

You can convert a JSON value to a scalar SQL value flexibly and error-free with LAX Conversion functions.

The following example demonstrates the power of these functions. LAX_IN64 automatically infers and processes input correctly.

 SELECT 
  
 LAX_INT64 
 ( 
 JSON 
  
'" 10 
"' ) 
  
 AS 
  
 id 
 ; 
+----+
| id |
+----+
| 10 |
+----+

In addition to LAX_IN64 , you can convert to other SQL types flexibly to JSON with the following functions:

Extract arrays from JSON

JSON can contain JSON arrays, which are not directly equivalent to an ARRAY<JSON> type in BigQuery. You can use the following functions to extract a BigQuery ARRAY from JSON:

  • JSON_QUERY_ARRAY : extracts an array and returns it as an ARRAY<JSON> of JSON.
  • JSON_VALUE_ARRAY : extracts an array of scalar values and returns it as an ARRAY<STRING> of scalar values.

The following example uses JSON_QUERY_ARRAY to extract JSON arrays.

 SELECT 
  
 JSON_QUERY_ARRAY 
 ( 
 cart 
 . 
 items 
 ) 
  
 AS 
  
 items 
 FROM 
  
 mydataset 
 . 
 table1 
 ; 
+----------------------------------------------------------------+
| items                                                          |
+----------------------------------------------------------------+
| [{"price":10,"product":"book"}","{"price":5,"product":"food"}] |
| [{"price":20,"product":"pen"}]                                 |
+----------------------------------------------------------------+

To split an array into its individual elements, use the UNNEST operator, which returns a table with one row for each element in the array. The following example selects the product member from each member of the items array:

 SELECT 
  
 id 
 , 
  
 JSON_VALUE 
 ( 
 item 
 . 
 product 
 ) 
  
 AS 
  
 product 
 FROM 
  
 mydataset 
 . 
 table1 
 , 
  
 UNNEST 
 ( 
 JSON_QUERY_ARRAY 
 ( 
 cart 
 . 
 items 
 )) 
  
 AS 
  
 item 
 ORDER 
  
 BY 
  
 id 
 ; 
+----+---------+
| id | product |
+----+---------+
|  1 | book    |
|  1 | food    |
|  2 | pen     |
+----+---------+

The next example is similar but uses the ARRAY_AGG function to aggregate the values back into a SQL array.

 SELECT 
  
 id 
 , 
  
 ARRAY_AGG 
 ( 
 JSON_VALUE 
 ( 
 item 
 . 
 product 
 )) 
  
 AS 
  
 products 
 FROM 
  
 mydataset 
 . 
 table1 
 , 
  
 UNNEST 
 ( 
 JSON_QUERY_ARRAY 
 ( 
 cart 
 . 
 items 
 )) 
  
 AS 
  
 item 
 GROUP 
  
 BY 
  
 id 
 ORDER 
  
 BY 
  
 id 
 ; 
+----+-----------------+
| id | products        |
+----+-----------------+
|  1 | ["book","food"] |
|  2 | ["pen"]         |
+----+-----------------+

For more information about arrays, see Working with arrays in GoogleSQL .

JSON nulls

The JSON type has a special null value that is different from the SQL NULL . A JSON null is not treated as a SQL NULL value, as the following example shows.

 SELECT 
  
 JSON 
  
 'null' 
  
 IS 
  
 NULL 
 ; 
+-------+
| f0_   |
+-------+
| false |
+-------+

When you extract a JSON field with a null value, the behavior depends on the function:

  • The JSON_QUERY function returns a JSON null , because it is a valid JSON value.
  • The JSON_VALUE function returns the SQL NULL , because JSON null is not a scalar value.

The following example shows the different behaviors:

 SELECT 
  
 json 
 . 
 a 
  
 AS 
  
 json_query 
 , 
  
 -- Equivalent to JSON_QUERY(json, '$.a') 
  
 JSON_VALUE 
 ( 
 json 
 , 
  
 '$.a' 
 ) 
  
 AS 
  
 json_value 
 FROM 
  
 ( 
 SELECT 
  
 JSON 
  
 '{"a": null}' 
  
 AS 
  
 json 
 ); 
+------------+------------+
| json_query | json_value |
+------------+------------+
| null       | NULL       |
+------------+------------+