Updating partitioned table data using DML

This page provides an overview of data manipulation language (DML) support for partitioned tables.

For more information on DML, see:

Tables used in examples

The following JSON schema definitions represent the tables used in the examples on this page.

mytable : an ingestion-time partitioned table

[
      {"name": "field1", "type": "INTEGER"},
      {"name": "field2", "type": "STRING"}
    ]

mytable2 : a standard (non-partitioned) table

[
      {"name": "id", "type": "INTEGER"},
      {"name": "ts", "type": "TIMESTAMP"}
    ]

mycolumntable : a partitioned table that is partitioned by using the ts TIMESTAMP column

[
      {"name": "field1", "type": "INTEGER"},
      {"name": "field2", "type": "STRING"}
      {"name": "field3", "type": "BOOLEAN"}
      {"name": "ts", "type": "TIMESTAMP"}
    ]

In examples where COLUMN_ID appears, replace it with the name of the column you wish to operate on.

Inserting data

You use a DML INSERT statement to add rows to a partitioned table.

Inserting data into ingestion-time partitioned tables

When you use a DML statement to add rows to an ingestion-time partitioned table, you can specify the partition to which the rows should be added. You reference the partition using the _PARTITIONTIME pseudocolumn .

For example, the following INSERT statement adds a row to the May 1, 2017 partition of mytable — “2017-05-01” .

 INSERT 
  
 INTO 
  
 project_id 
 . 
 dataset 
 . 
 mytable 
  
 ( 
 _PARTITIONTIME 
 , 
  
 field1 
 , 
  
 field2 
 ) 
 SELECT 
  
 TIMESTAMP 
 ( 
 "2017-05-01" 
 ), 
  
 1 
 , 
  
 "one"

Only timestamps that correspond to exact date boundaries can be used. For example, the following DML statement returns an error:

 INSERT 
  
 INTO 
  
 project_id 
 . 
 dataset 
 . 
 mytable 
  
 ( 
 _PARTITIONTIME 
 , 
  
 field1 
 , 
  
 field2 
 ) 
 SELECT 
  
 TIMESTAMP 
 ( 
 "2017-05-01 21:30:00" 
 ), 
  
 1 
 , 
  
 "one"

Inserting data into partitioned tables

Inserting data into a partitioned table using DML is the same as inserting data into a non-partitioned table.

For example, the following INSERT statement adds rows to partitioned table mycolumntable by selecting data from mytable2 (a non-partitioned table).

 INSERT 
  
 INTO 
  
 project_id 
 . 
 dataset 
 . 
 mycolumntable 
  
 ( 
 ts 
 , 
  
 field1 
 ) 
 SELECT 
  
 ts 
 , 
  
 id 
 FROM 
  
 project_id 
 . 
 dataset 
 . 
 mytable2

Deleting data

You use a DML DELETE statement to delete rows from a partitioned table.

Deleting data in ingestion-time partitioned tables

The following DELETE statement deletes all rows from the June 1, 2017 partition ( "2017-06-01" ) of mytable where field1 is equal to 21 . You reference the partition using the _PARTITIONTIME pseudocolumn .

 DELETE 
  
 project_id 
 . 
 dataset 
 . 
 mytable 
 WHERE 
  
 field1 
  
 = 
  
 21 
  
 AND 
  
 _PARTITIONTIME 
  
 = 
  
 "2017-06-01"

Deleting data in partitioned tables

Deleting data in a partitioned table using DML is the same as deleting data from a non-partitioned table.

For example, the following DELETE statement deletes all rows from the June 1, 2017 partition ( "2017-06-01" ) of mycolumntable where field1 is equal to 21 .

 DELETE 
  
 project_id 
 . 
 dataset 
 . 
 mycolumntable 
 WHERE 
  
 field1 
  
 = 
  
 21 
  
 AND 
  
 DATE 
 ( 
 ts 
 ) 
  
 = 
  
 "2017-06-01"

Using DML DELETE to delete partitions

If a qualifying DELETE statement covers all rows in a partition, BigQuery removes the entire partition. This removal is done without scanning bytes or consuming slots. The following example of a DELETE statement covers the entire partition of a filter on the _PARTITIONDATE pseudocolumn:

 DELETE 
  
 mydataset 
 . 
 mytable 
 WHERE 
  
 _PARTITIONDATE 
  
 IN 
  
 ( 
 '2076-10-07' 
 , 
  
 '2076-03-06' 
 );

Common disqualifications

Queries with the following characteristics may not benefit from the optimization:

Partial partition coverage
References to non-partitioning columns
Recently ingested data through the BigQuery Storage Write API or the legacy streaming API
Filters with subqueries or unsupported predicates

Eligibility for optimization can vary with the type of partitioning, the underlying storage metadata, and the filter predicates. As a best practice, perform a dry run to verify that the query results in 0 bytes processed.

Multi-statement transaction

This optimization works within a multi-statement transaction . The following query example replaces a partition with data from another table in a single transaction, without scanning the partition for the DELETE statement.

 DECLARE 
  
 REPLACE_DAY 
  
 DATE 
 ; 
 BEGIN 
  
 TRANSACTION 
 ; 
 -- find the partition which we want to replace 
 SET 
  
 REPLACE_DAY 
  
 = 
  
 ( 
 SELECT 
  
 MAX 
 ( 
 d 
 ) 
  
 FROM 
  
 mydataset 
 . 
 mytable_staging 
 ); 
 -- delete the entire partition from mytable 
 DELETE 
  
 FROM 
  
 mydataset 
 . 
 mytable 
  
 WHERE 
  
 part_col 
  
 = 
  
 REPLACE_DAY 
 ; 
 -- insert the new data into the same partition in mytable 
 INSERT 
  
 INTO 
  
 mydataset 
 . 
 mytable 
 SELECT 
  
 * 
  
 FROM 
  
 mydataset 
 . 
 mytable_staging 
  
 WHERE 
  
 part_col 
  
 = 
  
 REPLACE_DAY 
 ; 
 COMMIT 
  
 TRANSACTION 
 ;

Updating data

You use an UPDATE statement to update rows in a partitioned table.

Updating data in ingestion-time partitioned tables

The following UPDATE statement moves rows from one partition to another. Rows in the May 1, 2017 partition ( “2017-05-01” ) of mytable where field1 is equal to 21 are moved to the June 1, 2017 partition ( “2017-06-01” ).

 UPDATE 
  
 project_id 
 . 
 dataset 
 . 
 mytable 
 SET 
  
 _PARTITIONTIME 
  
 = 
  
 "2017-06-01" 
 WHERE 
  
 _PARTITIONTIME 
  
 = 
  
 "2017-05-01" 
  
 AND 
  
 field1 
  
 = 
  
 21

Updating data in partitioned tables

Updating data in a partitioned table using DML is the same as updating data from a non-partitioned table. For example, the following UPDATE statement moves rows from one partition to another. Rows in the May 1, 2017 partition ( “2017-05-01” ) of mytable where field1 is equal to 21 are moved to the June 1, 2017 partition ( “2017-06-01” ).

 UPDATE 
  
 project_id 
 . 
 dataset 
 . 
 mycolumntable 
 SET 
  
 ts 
  
 = 
  
 "2017-06-01" 
 WHERE 
  
 DATE 
 ( 
 ts 
 ) 
  
 = 
  
 "2017-05-01" 
  
 AND 
  
 field1 
  
 = 
  
 21

DML in hourly, monthly, and yearly partitioned tables

You can use DML statements to modify an hourly, monthly, or yearly partitioned table. Provide the hour, month, or year range of the relevant dates/timestamps/datetimes, as in the following example for monthly partitioned tables:

  
bq  
query  
--nouse_legacy_sql  
 'DELETE FROM my_dataset.my_table WHERE 
 TIMESTAMP_TRUNC(ts_column, MONTH) = "2020-01-01 00:00:00";'

Or another example for partitioned tables with DATETIME column:

  
bq  
query  
--nouse_legacy_sql  
 'DELETE FROM my_dataset.my_table WHERE 
 dt_column BETWEEN DATETIME("2020-01-01") 
 AND DATETIME("2020-05-01");'

Using a `MERGE` statement

You use a DML MERGE statement to combine INSERT , UPDATE , and DELETE operations for a partitioned table into one statement and perform them atomically.

Pruning partitions when using a `MERGE` statement

When you run a MERGE statement against a partitioned table, you can limit which partitions are scanned by including the partitioning column in either a subquery filter, a search_condition filter, or a merge_condition filter. Pruning can occur when scanning the source table or the target table, or both.

Each of the examples below queries an ingestion-time partitioned table using the _PARTITIONTIME pseudocolumn as a filter.

Using a subquery to filter source data

In the following MERGE statement, the subquery in the USING clause filters on the _PARTITIONTIME pseudocolumn in the source table.

 MERGE 
  
 dataset 
 . 
 target 
  
 T 
 USING 
  
 ( 
 SELECT 
  
 * 
  
 FROM 
  
 dataset 
 . 
 source 
  
 WHERE 
  
 _PARTITIONTIME 
  
 = 
  
 '2018-01-01' 
 ) 
  
 S 
 ON 
  
 T 
 . 
  COLUMN_ID 
 
  
 = 
  
 S 
 . 
  COLUMN_ID 
 
 WHEN 
  
 MATCHED 
  
 THEN 
  
 DELETE

Looking at the query execution plan, the subquery runs first. Only the rows in the '2018-01-01' partition in the source table are scanned. Here is the relevant stage in the query plan:

 READ $10:name, $11:_PARTITIONTIME
FROM temp.source
WHERE equal($11, 1514764800.000000000)

Using a filter in the `search_condition` of a `when_clause`

If a search_condition contains a filter, then the query optimizer attempts to prune partitions. For example, in the following MERGE statement, each WHEN MATCHED and WHEN NOT MATCHED clause contains a filter on the _PARTITIONTIME pseudocolumn.

 MERGE 
  
 dataset 
 . 
 target 
  
 T 
 USING 
  
 dataset 
 . 
 source 
  
 S 
 ON 
  
 T 
 . 
  COLUMN_ID 
 
  
 = 
  
 S 
 . 
  COLUMN_ID 
 
 WHEN 
  
 MATCHED 
  
 AND 
  
 T 
 . 
 _PARTITIONTIME 
  
 = 
  
 '2018-01-01' 
  
 THEN 
  
 UPDATE 
  
 SET 
  
  COLUMN_ID 
 
  
 = 
  
 S 
 . 
  COLUMN_ID 
 
 WHEN 
  
 MATCHED 
  
 AND 
  
 T 
 . 
 _PARTITIONTIME 
  
 = 
  
 '2018-01-02' 
  
 THEN 
  
 UPDATE 
  
 SET 
  
  COLUMN_ID 
 
  
 = 
  
  S 
 . 
 COLUMN_ID 
 
  
 + 
  
 10 
 WHEN 
  
 NOT 
  
 MATCHED 
  
 BY 
  
 SOURCE 
  
 AND 
  
 T 
 . 
 _PARTITIONTIME 
  
 = 
  
 '2018-01-03' 
  
 THEN 
  
 DELETE

During the join stage, only the following partitions are scanned in the target table: '2018-01-01' , '2018-01-02' , and '2018-01-03' — that is, the union of all the search_condition filters.

From the query execution plan:

 READ
$1:COLUMN_ID, $2:_PARTITIONTIME, $3:$file_temp_id, $4:$row_temp_id
FROM temp.target
WHERE or(equal($2, 1514764800.000000000), equal($2, 1514851200.000000000), equal($2, 1514937600.000000000))

However, in the following example, the WHEN NOT MATCHED BY SOURCE clause does not have a filter expression:

 MERGE 
  
 dataset 
 . 
 target 
  
 T 
 USING 
  
 dataset 
 . 
 source 
  
 S 
 ON 
  
 T 
 . 
  COLUMN_ID 
 
  
 = 
  
 S 
 . 
  COLUMN_ID 
 
 WHEN 
  
 MATCHED 
  
 AND 
  
 T 
 . 
 _PARTITIONTIME 
  
 = 
  
 '2018-01-01' 
  
 THEN 
  
 UPDATE 
  
 SET 
  
  COLUMN_ID 
 
  
 = 
  
 S 
 . 
  COLUMN_ID 
 
 WHEN 
  
 NOT 
  
 MATCHED 
  
 BY 
  
 SOURCE 
  
 THEN 
  
 UPDATE 
  
 SET 
  
  COLUMN_ID 
 
  
 = 
  
  COLUMN_ID 
 
  
 + 
  
 1

This query must scan the entire target table to compute the WHEN NOT MATCHED BY SOURCE clause. As a result, no partitions are pruned.

Using a constant false predicate in a `merge_condition`

If you use the WHEN NOT MATCHED and WHEN NOT MATCHED BY SOURCE clauses together, then BigQuery usually performs a full outer join, which cannot be pruned. However, if the merge condition uses a constant false predicate, then BigQuery can use the filter condition for partition pruning. For more information about the use of constant false predicates, see the description of the merge_condition clause in the MERGE statement documentation.

The following example scans only the '2018-01-01' partition in both the target and source tables.

 MERGE 
  
 dataset 
 . 
 target 
  
 T 
 USING 
  
 dataset 
 . 
 source 
  
 S 
 ON 
  
 FALSE 
 WHEN 
  
 NOT 
  
 MATCHED 
  
 AND 
  
 _PARTITIONTIME 
  
 = 
  
 '2018-01-01' 
  
 THEN 
  
 INSERT 
 ( 
  COLUMN_ID 
 
 ) 
  
 VALUES 
 ( 
  COLUMN_ID 
 
 ) 
 WHEN 
  
 NOT 
  
 MATCHED 
  
 BY 
  
 SOURCE 
  
 AND 
  
 _PARTITIONTIME 
  
 = 
  
 '2018-01-01' 
  
 THEN 
  
 DELETE

Using a filter in a `merge_condition`

The query optimizer attempts to use a filter in a merge_condition to prune partitions. The query optimizer might or might not be able to push the predicate down to the table scanning stage, depending on the type of join.

In the following example, the merge_condition is used as a predicate to join the source and target tables. The query optimizer can push this predicate down when it scans both tables. As a result, the query only scans the '2018-01-01' partition in both the target and source tables.

 MERGE 
  
 dataset 
 . 
 target 
  
 T 
 USING 
  
 dataset 
 . 
 source 
  
 S 
 ON 
  
 T 
 . 
  COLUMN_ID 
 
  
 = 
  
 S 
 . 
  COLUMN_ID 
 
  
 AND 
  
 T 
 . 
 _PARTITIONTIME 
  
 = 
  
 '2018-01-01' 
  
 AND 
  
 S 
 . 
 _PARTITIONTIME 
  
 = 
  
 '2018-01-01' 
 WHEN 
  
 MATCHED 
  
 THEN 
  
 UPDATE 
  
 SET 
  
  COLUMN_ID 
 
  
 = 
  
  NEW_VALUE

In the next example, the merge_condition does not contain a predicate for the source table, so no partition pruning can be performed on the source table. The statement does contain a predicate for the target table, but the statement uses a WHEN NOT MATCHED BY SOURCE clause, rather than a WHEN MATCHED clause. That means the query has to scan the entire target table for the rows that don't match.

 MERGE 
  
 dataset 
 . 
 target 
  
 T 
 USING 
  
 dataset 
 . 
 source 
  
 S 
 ON 
  
 T 
 . 
  COLUMN_ID 
 
  
 = 
  
 S 
 . 
  COLUMN_ID 
 
  
 AND 
  
 T 
 . 
 _PARTITIONTIME 
  
 = 
  
 '2018-01-01' 
 WHEN 
  
 NOT 
  
 MATCHED 
  
 BY 
  
 SOURCE 
  
 THEN 
  
 UPDATE 
  
 SET 
  
  COLUMN_ID 
 
  
 = 
  
  NEW_VALUE

Limitations

For information about DML limitations, see Limitations on the DML reference page.

Quotas

For information about DML quota information, see DML statements on the Quotas and limits page.

Pricing

For information about DML pricing, see how to compute the query size for DML statements run on Partitioned tables .

Table security

To control access to tables in BigQuery, see Control access to resources with IAM .

What's next

Learn how to create partitioned tables
Learn how to query partitioned tables
Get an introduction to DML
Learn how to compose DML statements using DML syntax