Improve INSERT SELECT performance with parallel SELECT

Using parallel execution for the SELECT portion of an INSERT INTO ... SELECT query can improve AlloyDB for PostgreSQL query performance, especially for partitioned tables and complex queries.

By parallelizing the SELECT subplan, the database uses multiple worker processes to retrieve data. The database then inserts the retrieved data into the target table. This division of labor can lead to database performance improvements.

How parallel query plans work in PostgreSQL

In PostgreSQL, the query optimizer can create a parallel plan for an entire query or part of a query. When a parallel plan is generated, it adds a Gather or Gather Merge node to the top of the plan.

This node coordinates multiple parallel worker processes. Each worker executes a part of the task, for example, scanning a portion of a table. The Gather node then collects the results from all workers and passes them on to the next stage of the query, or back to the client.

For example, the following query plan shows a Gather node that plans to use five worker processes for a parallel sequential scan on table t1 .

  EXPLAIN 
  
 SELECT 
  
 * 
  
 FROM 
  
 t1 
 ; 
  
 QUERY 
  
 PLAN 
  
  
 Gather 
  
 ( 
 cost 
 = 
 0 
 . 
 00 
 .. 
 143359 
 . 
 76 
  
 rows 
 = 
 9999878 
  
 width 
 = 
 60 
 ) 
  
 Workers 
  
 Planned 
 : 
  
 5 
  
 - 
>  
 Parallel 
  
 Seq 
  
 Scan 
  
 on 
  
 t1 
  
 ( 
 cost 
 = 
 0 
 . 
 00 
 .. 
 143359 
 . 
 76 
  
 rows 
 = 
 1999976 
  
 width 
 = 
 60 
 ) 
 ( 
 3 
  
 rows 
 )

Parallel execution for INSERT...SELECT statements

AlloyDB improves INSERT INTO ... SELECT performance by using multiple worker processes to parallelize the SELECT statement. This division of labor can lead to database performance improvements. The query must be parallel-safe and there must be no conditions that prevent parallelization.

Many database operations use INSERT INTO ... SELECT statements to add new rows to a table. Often, the SELECT part of the statement is the most resource-intensive part of the query.

The functionality is controlled by the following configuration parameters:

enable_parallel_select_for_insert_select : enables or disables the feature . This parameter is on by default in PostgreSQL 14 and later.
enable_parallel_select_for_insert_select_into_part_table : enables or disables the feature that lets you insert into partitioned tables. This parameter is off by default in PostgreSQL 16 and later.

For the query planner to consider a parallel plan, the INSERT SELECT statement must be parallel-safe . This means that no part of the operation can contain functions or expressions that cannot be safely executed in parallel. If any part of the query is determined to be parallel unsafe, the optimizer falls back to a non-parallel plan. For more information, see parallel-safety .

Conditions that prevent parallel execution

The query optimizer doesn't use a parallel plan if the INSERT SELECT statement is considered parallel unsafe. The following conditions prevent parallelization:

The target table is a foreign table.
The query uses a modifiable Common Table Expression (CTE) that contains a data modification statement, for example, DELETE .
An index on the target table uses a parallel unsafe expression or function.
A column on the target table has a default value that uses a parallel unsafe function.
A trigger on the target table uses a parallel unsafe expression or function.
The target table has a DOMAIN column with a CHECK constraint that uses a parallel unsafe function.
A CHECK constraint on the target table contains a parallel unsafe expression or function.
The target table is a partitioned table and enable_parallel_select_for_insert_select_into_part_table is disabled.

Before you begin

This document assumes that you have an AlloyDB database. To create a database, see Create a cluster and its primary instance and Create and manage a database .

Required roles

To get the permissions that you need to change database flags at the session level, ask your administrator to grant you the AlloyDB Database User ( roles/alloydb.databaseUser ) Identity and Access Management (IAM) role on your project. For more information about granting roles, see Manage access to projects, folders, and organizations .

Enable parallel execution parameters

To enable parallel execution for partitioned target tables, follow these steps:

In the Google Cloud console, go to the Clusterspage.

Go to Clusters
Select a cluster from the list.
In the navigation menu, click AlloyDB Studio.
Sign into AlloyDB Studio using the name of your database, username, and password.

Set the enable_parallel_select_for_insert_select_into_part_table GUC to on .

  SET 
  
 enable_parallel_select_for_insert_select_into_part_table 
  
 = 
  
 on 
 ;

After you enable the GUC parameter, the query planner automatically considers parallel plans for INSERT INTO ... SELECT statements where the target is a partitioned table, provided that the query is determined to be parallel-safe.

The following example results in an EXPLAIN plan that shows a Gather node, which indicates that the SELECT from source_table is executed in parallel.

  -- Create a partitioned table 
 CREATE 
  
 TABLE 
  
 part_table 
  
 ( 
 a 
  
 INT 
 , 
  
 b 
  
 TEXT 
 ) 
  
 PARTITION 
  
 BY 
  
 RANGE 
  
 ( 
 a 
 ); 
 CREATE 
  
 TABLE 
  
 part_table_1 
  
 PARTITION 
  
 OF 
  
 part_table 
  
 FOR 
  
 VALUES 
  
 FROM 
  
 ( 
 MINVALUE 
 ) 
  
 TO 
  
 ( 
 1000 
 ); 
 CREATE 
  
 TABLE 
  
 part_table_2 
  
 PARTITION 
  
 OF 
  
 part_table 
  
 FOR 
  
 VALUES 
  
 FROM 
  
 ( 
 1000 
 ) 
  
 TO 
  
 ( 
 MAXVALUE 
 ); 
 -- Create a source table 
 CREATE 
  
 TABLE 
  
 source_table 
  
 ( 
 c1 
  
 INT 
 , 
  
 c2 
  
 TEXT 
 ); 
 INSERT 
  
 INTO 
  
 source_table 
  
 SELECT 
  
 i 
 , 
  
 'value_' 
  
 || 
  
 i 
  
 FROM 
  
 generate_series 
 ( 
 1 
 , 
  
 2000 
 ) 
  
 i 
 ; 
 ANALYZE 
  
 source_table 
 ; 
 -- Enable the feature 
 SET 
  
 enable_parallel_select_for_insert_select_into_part_table 
  
 = 
  
 on 
 ; 
 -- Optional for parallelizing the query with a small amount of data 
 SET 
  
 parallel_setup_cost 
 = 
 0 
 ; 
 SET 
  
 parallel_tuple_cost 
 = 
 0 
 ; 
 SET 
  
 min_parallel_table_scan_size 
 = 
 0 
 ; 
 -- Run the insert with a parallelizable select 
 EXPLAIN 
  
 INSERT 
  
 INTO 
  
 part_table 
  
 SELECT 
  
 * 
  
 FROM 
  
 source_table 
 ;

This returns the following output:

 EXPLAIN (COSTS OFF) INSERT INTO part_table SELECT * FROM source_table;
                QUERY PLAN                   
-----------------------------------------------
Insert on part_table
->  Gather
        Workers Planned: 2
        ->  Parallel Seq Scan on source_table
(4 rows)

Disable parallel execution parameters

To disable the parameters, follow these steps:

In the Google Cloud console, go to the Clusterspage.

Go to Clusters
Select a cluster from the list.
In the navigation menu, click AlloyDB Studio.
Sign into AlloyDB Studio using the name of your database, username, and password.

To disable the enable_parallel_select_for_insert_select parameter, run the following SQL command:

  SET 
  
 enable_parallel_select_for_insert_select 
  
 = 
  
 OFF 
 ;

To disable the enable_parallel_select_for_insert_select_into_part_table parameter, run the following SQL command:

  SET 
  
 enable_parallel_select_for_insert_select_into_part_table 
  
 = 
  
 OFF 
 ;

Verify a parallel plan

You can verify that the optimizer is using a parallel plan by using the EXPLAIN command. Look for a Gather node and the Workers Planned or Workers Launched attributes in the query plan.

Standard table example

In the following plan example, six worker processes are launched for a parallel hash join to execute the SELECT statement.

  EXPLAIN 
  
 ( 
 ANALYZE 
 ) 
 INSERT 
  
 INTO 
  
 t1 
  
 SELECT 
  
 t2 
 . 
 * 
  
 FROM 
  
 t1 
 , 
  
 t2 
  
 WHERE 
  
 t1 
 . 
 c1 
  
 != 
  
 t2 
 . 
 c1 
  
 AND 
  
 t1 
 . 
 c2 
  
 = 
  
 t2 
 . 
 c2 
 ; 
  
 QUERY 
  
 PLAN 
  
  
 Insert 
  
 on 
  
 t1 
  
 ( 
 cost 
 = 
 1209138 
 . 
 00 
 .. 
 12801765 
 . 
 49 
  
 rows 
 = 
 0 
  
 width 
 = 
 0 
 ) 
  
 ( 
 actual 
  
 time 
 = 
 16812 
 . 
 677 
 .. 
 19337 
 . 
 150 
  
 rows 
 = 
 0 
  
 loops 
 = 
 1 
 ) 
  
 - 
>  
 Gather 
  
 ( 
 cost 
 = 
 1209138 
 . 
 00 
 .. 
 12801765 
 . 
 49 
  
 rows 
 = 
 99995407 
  
 width 
 = 
 24 
 ) 
  
 ( 
 actual 
  
 time 
 = 
 16812 
 . 
 674 
 .. 
 19337 
 . 
 144 
  
 rows 
 = 
 0 
  
 loops 
 = 
 1 
 ) 
  
 Workers 
  
 Planned 
 : 
  
 6 
  
 Workers 
  
 Launched 
 : 
  
 6 
  
 - 
>  
 Parallel 
  
 Hash 
  
 Join 
  
 (...) 
 (...)

Partitioned table example

The following example shows that, after you enable the feature for partitioned tables, the plan shows a Gather node with four planned workers for the SELECT .

  -- First, enable the feature for partitioned tables 
 SET 
  
 enable_parallel_select_for_insert_select_into_part_table 
  
 = 
  
 ON 
 ; 
 -- Then, explain the query 
 EXPLAIN 
  
 ( 
 COSTS 
  
 OFF 
 ) 
  
 INSERT 
  
 INTO 
  
 part_table 
  
 SELECT 
  
 * 
  
 FROM 
  
 source_table 
 ; 
  
 QUERY 
  
 PLAN 
  
  
  
 Insert 
  
 on 
  
 part_table 
  
 - 
>  
 Gather 
  
 Workers 
  
 Planned 
 : 
  
 2 
  
 - 
>  
 Parallel 
  
 Seq 
  
 Scan 
  
 on 
  
 source_table 
 ( 
 4 
  
 rows 
 )

What's next

Learn more about parallel-safety .

Improve INSERT SELECT performance with parallel SELECT Stay organized with collections Save and categorize content based on your preferences.