Multi-statement transactions

BigQuery supports multi-statement transactions inside a single query, or across multiple queries when using sessions. A multi-statement transaction lets you perform mutating operations, such as inserting or deleting rows on one or more tables, and either commit or roll back the changes atomically.

Uses for multi-statement transactions include:

  • Performing DML mutations on multiple tables as a single transaction. The tables can span multiple datasets or projects.
  • Performing mutations on a single table in several stages, based on intermediate computations.

Transactions guarantee ACID properties and support snapshot isolation. During a transaction, all reads return a consistent snapshot of the tables referenced in the transaction. If a statement in a transaction modifies a table, the changes are visible to subsequent statements within the same transaction.

Transaction scope

A transaction must be contained in a single SQL query, except when in Session mode . A query can contain multiple transactions, but they cannot be nested. You can run multi-statement transactions over multiple queries in a session.

To start a transaction, use the BEGIN TRANSACTION statement. The transaction ends when any of the following occur:

  • The query executes a COMMIT TRANSACTION statement. This statement atomically commits all changes made inside the transaction.
  • The query executes a ROLLBACK TRANSACTION statement. This statement abandons all changes made inside the transaction.
  • The query ends before reaching either of these two statements. In that case, BigQuery automatically rolls back the transaction.

If an error occurs during a transaction and the query has an exception handler , then BigQuery transfers control to the exception handler. Inside the exception block, can choose whether to commit or roll back the transaction.

If an error occurs during a transaction and there is no exception handler, then the query fails and BigQuery automatically rolls back the transaction.

The following example shows an exception handler that rolls back a transaction:

 BEGIN 
  
 BEGIN 
  
 TRANSACTION 
 ; 
  
 INSERT 
  
 INTO 
  
 mydataset 
 . 
 NewArrivals 
  
 VALUES 
  
 ( 
 'top load washer' 
 , 
  
 100 
 , 
  
 'warehouse #1' 
 ); 
  
 -- Trigger an error. 
  
 SELECT 
  
 1 
 / 
 0 
 ; 
  
 COMMIT 
  
 TRANSACTION 
 ; 
 EXCEPTION 
  
 WHEN 
  
 ERROR 
  
 THEN 
  
 -- Roll back the transaction inside the exception handler. 
  
 SELECT 
  
 @@ 
 error 
 . 
 message 
 ; 
  
 ROLLBACK 
  
 TRANSACTION 
 ; 
 END 
 ; 

Statements supported in transactions

The following statement types are supported in transactions:

  • Query statements: SELECT
  • DML statements: INSERT , UPDATE , DELETE , MERGE , and TRUNCATE TABLE
  • DDL statements on temporary entities:

    • CREATE TEMP TABLE
    • CREATE TEMP FUNCTION
    • DROP TABLE on a temporary table
    • DROP FUNCTION on a temporary function

DDL statements that create or drop permanent entities, such as datasets, tables, and functions, are not supported inside transactions.

Date/time functions in transactions

Within a transaction, the following date/time functions have special behaviors:

Example of a transaction

This example assumes there are two tables named Inventory and NewArrivals , created as follows:

 CREATE 
  
 OR 
  
 REPLACE 
  
 TABLE 
  
 mydataset 
 . 
 Inventory 
 ( 
  
 product 
  
 string 
 , 
  
 quantity 
  
 int64 
 , 
  
 supply_constrained 
  
 bool 
 ); 
 CREATE 
  
 OR 
  
 REPLACE 
  
 TABLE 
  
 mydataset 
 . 
 NewArrivals 
 ( 
  
 product 
  
 string 
 , 
  
 quantity 
  
 int64 
 , 
  
 warehouse 
  
 string 
 ); 
 INSERT 
  
 mydataset 
 . 
 Inventory 
  
 ( 
 product 
 , 
  
 quantity 
 ) 
 VALUES 
 ( 
 'top load washer' 
 , 
  
 10 
 ), 
  
 ( 
 'front load washer' 
 , 
  
 20 
 ), 
  
 ( 
 'dryer' 
 , 
  
 30 
 ), 
  
 ( 
 'refrigerator' 
 , 
  
 10 
 ), 
  
 ( 
 'microwave' 
 , 
  
 20 
 ), 
  
 ( 
 'dishwasher' 
 , 
  
 30 
 ); 
 INSERT 
  
 mydataset 
 . 
 NewArrivals 
  
 ( 
 product 
 , 
  
 quantity 
 , 
  
 warehouse 
 ) 
 VALUES 
 ( 
 'top load washer' 
 , 
  
 100 
 , 
  
 'warehouse #1' 
 ), 
  
 ( 
 'dryer' 
 , 
  
 200 
 , 
  
 'warehouse #2' 
 ), 
  
 ( 
 'oven' 
 , 
  
 300 
 , 
  
 'warehouse #1' 
 ); 

The Inventory table contains information about current inventory, and NewArrivals contains information about newly arrived items.

The following transaction updates Inventory with new arrivals and deletes the corresponding records from NewArrivals . Assuming that all statements complete successfully, the changes in both tables are committed atomically as a single transaction.

 BEGIN 
  
 TRANSACTION 
 ; 
 -- Create a temporary table that holds new arrivals from 'warehouse #1'. 
 CREATE 
  
 TEMP 
  
 TABLE 
  
 tmp 
  
 AS 
  
 SELECT 
  
 * 
  
 FROM 
  
 mydataset 
 . 
 NewArrivals 
  
 WHERE 
  
 warehouse 
  
 = 
  
 'warehouse #1' 
 ; 
 -- Delete the matching records from the NewArravals table. 
 DELETE 
  
 mydataset 
 . 
 NewArrivals 
  
 WHERE 
  
 warehouse 
  
 = 
  
 'warehouse #1' 
 ; 
 -- Merge the records from the temporary table into the Inventory table. 
 MERGE 
  
 mydataset 
 . 
 Inventory 
  
 AS 
  
 I 
 USING 
  
 tmp 
  
 AS 
  
 T 
 ON 
  
 I 
 . 
 product 
  
 = 
  
 T 
 . 
 product 
 WHEN 
  
 NOT 
  
 MATCHED 
  
 THEN 
  
 INSERT 
 ( 
 product 
 , 
  
 quantity 
 , 
  
 supply_constrained 
 ) 
  
 VALUES 
 ( 
 product 
 , 
  
 quantity 
 , 
  
 false 
 ) 
 WHEN 
  
 MATCHED 
  
 THEN 
  
 UPDATE 
  
 SET 
  
 quantity 
  
 = 
  
 I 
 . 
 quantity 
  
 + 
  
 T 
 . 
 quantity 
 ; 
 -- Drop the temporary table and commit the transaction. 
 DROP 
  
 TABLE 
  
 tmp 
 ; 
 COMMIT 
  
 TRANSACTION 
 ; 

Transaction concurrency

If a transaction mutates (update or deletes) rows in a table, then other transactions or DML statements that mutate rows in the same table cannot run concurrently. Conflicting transactions are cancelled. Conflicting DML statements that run outside of a transaction are queued to run later, subject to queuing limits .

Operations that read or append new rows can run concurrently with the transaction. For example, any of the following operations can be performed concurrently on a table while a transaction mutates data in the same table:

  • SELECT statements
  • BigQuery Storage Read API read operations
  • Queries from BigQuery BI Engine
  • INSERT statements
  • Load jobs that use WRITE_APPEND disposition to append rows
  • Streaming writes

If a transaction only reads a table or appends new rows to it, any operation can be performed concurrently on that table.

Viewing transaction information

BigQuery assigns a transaction ID to each multi-statement transaction. The transaction ID is attached to each query that executes inside the transaction. To view the transaction IDs for your jobs, query the INFORMATION_SCHEMA.JOBS* views for the transaction_id column.

When a multi-statement transaction runs, BigQuery creates a child job for each statement in the transaction. For a given transaction, every child job that is associated with that transaction has the same transaction_id value.

The following examples show how to find information about your transactions.

Find all committed or rolled back transactions

The following query returns all transactions that were successfully committed.

 SELECT 
  
 transaction_id 
 , 
  
 parent_job_id 
 , 
  
 query 
 FROM 
  
 ` 
 region 
 - 
 us 
 ` 
 . 
 INFORMATION_SCHEMA 
 . 
 JOBS_BY_PROJECT 
 WHERE 
  
 statement_type 
  
 = 
  
 "COMMIT_TRANSACTION" 
  
 AND 
  
 error_result 
  
 IS 
  
 NULL 
 ; 

The following query returns all transactions that were successfully rolled back.

 SELECT 
  
 transaction_id 
 , 
  
 parent_job_id 
 , 
  
 query 
 FROM 
  
 ` 
 region 
 - 
 us 
 ` 
 . 
 INFORMATION_SCHEMA 
 . 
 JOBS_BY_PROJECT 
 WHERE 
  
 statement_type 
  
 = 
  
 "ROLLBACK_TRANSACTION" 
  
 AND 
  
 error_result 
  
 IS 
  
 NULL 
 ; 

Find the start and end time of a transaction

The following query returns the starting and ending times for a specified transaction ID.

 SELECT 
  
 transaction_id 
 , 
  
 start_time 
 , 
  
 end_time 
 , 
  
 statement_type 
 FROM 
  
 ` 
 region 
 - 
 us 
 ` 
 . 
 INFORMATION_SCHEMA 
 . 
 JOBS_BY_USER 
 WHERE 
  
 transaction_id 
  
 = 
  
 " TRANSACTION_ID 
" 
 AND 
  
 statement_type 
  
 IN 
  
 ( 
 "BEGIN_TRANSACTION" 
 , 
  
 "COMMIT_TRANSACTION" 
 , 
  
 "ROLLBACK_TRANSACTION" 
 ) 
 ORDER 
  
 BY 
  
 start_time 
 ; 

Find the transaction in which a job is running

The following query gets the transaction associated with a specified job ID. It returns NULL if the job is not running within a multi-statement transaction.

 SELECT 
  
 transaction_id 
 FROM 
  
 ` 
 region 
 - 
 us 
 ` 
 . 
 INFORMATION_SCHEMA 
 . 
 JOBS_BY_PROJECT 
 WHERE 
  
 job_id 
  
 = 
  
 ' JOB_ID 
' 
 ; 

Find the current job running within a transaction

The following query returns information about the job that is currently running within a specified transaction, if any.

 SELECT 
  
 job_id 
 , 
  
 query 
 , 
  
 start_time 
 , 
  
 total_slot_ms 
 FROM 
  
 ` 
 region 
 - 
 us 
 ` 
 . 
 INFORMATION_SCHEMA 
 . 
 JOBS_BY_PROJECT 
 WHERE 
  
 transaction_id 
  
 = 
  
 ' TRANSACTION_ID 
' 
  
 AND 
  
 state 
  
 = 
  
 RUNNING 
 ; 

Find the active transactions that affect a table

The following query returns the active transactions that affect a specified table. For each active transaction, if the transaction is running as part of multi-statement queries such as within a stored procedure , then it also returns the parent job ID. If the transaction is running within a session, then it also returns the session info.

 WITH 
  
 running_transactions 
  
 AS 
  
 ( 
  
 SELECT 
  
 DISTINCT 
  
 transaction_id 
  
 FROM 
  
 ` 
 region 
 - 
 us 
 ` 
 . 
 INFORMATION_SCHEMA 
 . 
 JOBS_BY_PROJECT 
  
 EXCEPT 
  
 DISTINCT 
  
 SELECT 
  
 transaction_id 
  
 FROM 
  
 ` 
 region 
 - 
 us 
 ` 
 . 
 INFORMATION_SCHEMA 
 . 
 JOBS_BY_PROJECT 
  
 WHERE 
  
 statement_type 
  
 = 
  
 'COMMIT_TRANSACTION' 
  
 OR 
  
 statement_type 
  
 = 
  
 'ROLLBACK_TRANSACTION' 
 ) 
 SELECT 
  
 jobs 
 . 
 transaction_id 
 , 
  
 parent_job_id 
 , 
  
 session_info 
 , 
  
 query 
 FROM 
  
 ` 
 region 
 - 
 us 
 ` 
 . 
 INFORMATION_SCHEMA 
 . 
 JOBS_BY_PROJECT 
  
 AS 
  
 jobs 
 , 
  
 running_transactions 
 WHERE 
  
 destination_table 
  
 = 
  
 ( 
 " PROJECT_NAME 
" 
 , 
  
 " DATASET_NAME 
" 
 , 
  
 " TABLE_NAME 
" 
 ) 
  
 AND 
  
 jobs 
 . 
 transaction_id 
  
 = 
  
 running_transactions 
 . 
 transaction_id 
 ; 

Find the active transactions running in a multi-statement transaction

The following query returns the active transactions for a particular job, specified by the ID of the job that is running the multi-statement transaction.

 SELECT 
  
 DISTINCT 
  
 transaction_id 
 FROM 
  
 ` 
 region 
 - 
 us 
 ` 
 . 
 INFORMATION_SCHEMA 
 . 
 JOBS_BY_PROJECT 
 WHERE 
  
 parent_job_id 
  
 = 
  
 " JOB_ID 
" 
 EXCEPT 
  
 DISTINCT 
 SELECT 
  
 transaction_id 
 FROM 
  
 ` 
 region 
 - 
 us 
 ` 
 . 
 INFORMATION_SCHEMA 
 . 
 JOBS_BY_PROJECT 
 WHERE 
  
 parent_job_id 
  
 = 
  
 " JOB_ID 
" 
  
 AND 
  
 ( 
 statement_type 
  
 = 
  
 'COMMIT_TRANSACTION' 
  
 OR 
  
 statement_type 
  
 = 
  
 'ROLLBACK_TRANSACTION' 
 ); 

Limitations

  • Transactions cannot use DDL statements that affect permanent entities.
  • Within a transaction, materialized views are interpreted as logical views. You can still query a materialized view inside a transaction, but it doesn't result in any performance improvement or cost reduction compared with the equivalent logical view.
  • A multi-statement transaction that fails triggers a rollback operation, undoing all pending changes and precluding retries.

  • A transaction can mutate data in at most 100 tables and can perform at most 100,000 partition modifications.

  • BI Engine does not accelerate queries inside a transaction.