Improve INSERT SELECT performance with parallel SELECT

Using parallel execution for the SELECT portion of an INSERT INTO ... SELECT query can improve AlloyDB for PostgreSQL query performance, especially for partitioned tables and complex queries.

By parallelizing the SELECT subplan, the database uses multiple worker processes to retrieve data. The database then inserts the retrieved data into the target table. This division of labor can lead to database performance improvements.

How parallel query plans work in PostgreSQL

In PostgreSQL, the query optimizer can create a parallel plan for an entire query or part of a query. When a parallel plan is generated, it adds a Gather or Gather Merge node to the top of the plan.

This node coordinates multiple parallel worker processes. Each worker executes a part of the task, for example, scanning a portion of a table. The Gather node then collects the results from all workers and passes them on to the next stage of the query, or back to the client.

For example, the following query plan shows a Gather node that plans to use five worker processes for a parallel sequential scan on table t1 .

  EXPLAIN 
  
 SELECT 
  
 * 
  
 FROM 
  
 t1 
 ; 
  
 QUERY 
  
 PLAN 
 
Gather ( cost = 0 . 00 .. 143359 . 76 rows = 9999878 width = 60 ) Workers Planned : 5 - > Parallel Seq Scan on t1 ( cost = 0 . 00 .. 143359 . 76 rows = 1999976 width = 60 ) ( 3 rows )

Parallel execution for INSERT...SELECT statements

AlloyDB improves INSERT INTO ... SELECT performance by using multiple worker processes to parallelize the SELECT statement. This division of labor can lead to database performance improvements. The query must be parallel-safe and there must be no conditions that prevent parallelization.

Many database operations use INSERT INTO ... SELECT statements to add new rows to a table. Often, the SELECT part of the statement is the most resource-intensive part of the query.

The functionality is controlled by the following configuration parameters:

  • enable_parallel_select_for_insert_select : enables or disables the feature . This parameter is on by default in PostgreSQL 14 and later.
  • enable_parallel_select_for_insert_select_into_part_table : enables or disables the feature that lets you insert into partitioned tables. This parameter is off by default in PostgreSQL 16 and later.

For the query planner to consider a parallel plan, the INSERT SELECT statement must be parallel-safe . This means that no part of the operation can contain functions or expressions that cannot be safely executed in parallel. If any part of the query is determined to be parallel unsafe, the optimizer falls back to a non-parallel plan. For more information, see parallel-safety .

Conditions that prevent parallel execution

The query optimizer doesn't use a parallel plan if the INSERT SELECT statement is considered parallel unsafe. The following conditions prevent parallelization:

  • The target table is a foreign table.
  • The query uses a modifiable Common Table Expression (CTE) that contains a data modification statement, for example, DELETE .
  • An index on the target table uses a parallel unsafe expression or function.
  • A column on the target table has a default value that uses a parallel unsafe function.
  • A trigger on the target table uses a parallel unsafe expression or function.
  • The target table has a DOMAIN column with a CHECK constraint that uses a parallel unsafe function.
  • A CHECK constraint on the target table contains a parallel unsafe expression or function.
  • The target table is a partitioned table and enable_parallel_select_for_insert_select_into_part_table is disabled.

Before you begin

This document assumes that you have an AlloyDB database. To create a database, see Create a cluster and its primary instance and Create and manage a database .

Required roles

To get the permissions that you need to change database flags at the session level, ask your administrator to grant you the AlloyDB Database User ( roles/alloydb.databaseUser ) Identity and Access Management (IAM) role on your project. For more information about granting roles, see Manage access to projects, folders, and organizations .

Enable parallel execution parameters

To enable parallel execution for partitioned target tables, follow these steps:

  1. In the Google Cloud console, go to the Clusterspage.

    Go to Clusters

  2. Select a cluster from the list.

  3. In the navigation menu, click AlloyDB Studio.

  4. Sign into AlloyDB Studio using the name of your database, username, and password.

  5. Set the enable_parallel_select_for_insert_select_into_part_table GUC to on .

      SET 
      
     enable_parallel_select_for_insert_select_into_part_table 
      
     = 
      
     on 
     ; 
     
    

    After you enable the GUC parameter, the query planner automatically considers parallel plans for INSERT INTO ... SELECT statements where the target is a partitioned table, provided that the query is determined to be parallel-safe.

    The following example results in an EXPLAIN plan that shows a Gather node, which indicates that the SELECT from source_table is executed in parallel.

      -- Create a partitioned table 
     CREATE 
      
     TABLE 
      
     part_table 
      
     ( 
     a 
      
     INT 
     , 
      
     b 
      
     TEXT 
     ) 
      
     PARTITION 
      
     BY 
      
     RANGE 
      
     ( 
     a 
     ); 
     CREATE 
      
     TABLE 
      
     part_table_1 
      
     PARTITION 
      
     OF 
      
     part_table 
      
     FOR 
      
     VALUES 
      
     FROM 
      
     ( 
     MINVALUE 
     ) 
      
     TO 
      
     ( 
     1000 
     ); 
     CREATE 
      
     TABLE 
      
     part_table_2 
      
     PARTITION 
      
     OF 
      
     part_table 
      
     FOR 
      
     VALUES 
      
     FROM 
      
     ( 
     1000 
     ) 
      
     TO 
      
     ( 
     MAXVALUE 
     ); 
     -- Create a source table 
     CREATE 
      
     TABLE 
      
     source_table 
      
     ( 
     c1 
      
     INT 
     , 
      
     c2 
      
     TEXT 
     ); 
     INSERT 
      
     INTO 
      
     source_table 
      
     SELECT 
      
     i 
     , 
      
     'value_' 
      
     || 
      
     i 
      
     FROM 
      
     generate_series 
     ( 
     1 
     , 
      
     2000 
     ) 
      
     i 
     ; 
     ANALYZE 
      
     source_table 
     ; 
     -- Enable the feature 
     SET 
      
     enable_parallel_select_for_insert_select_into_part_table 
      
     = 
      
     on 
     ; 
     -- Optional for parallelizing the query with a small amount of data 
     SET 
      
     parallel_setup_cost 
     = 
     0 
     ; 
     SET 
      
     parallel_tuple_cost 
     = 
     0 
     ; 
     SET 
      
     min_parallel_table_scan_size 
     = 
     0 
     ; 
     -- Run the insert with a parallelizable select 
     EXPLAIN 
      
     INSERT 
      
     INTO 
      
     part_table 
      
     SELECT 
      
     * 
      
     FROM 
      
     source_table 
     ; 
     
    

    This returns the following output:

     EXPLAIN (COSTS OFF) INSERT INTO part_table SELECT * FROM source_table;
                    QUERY PLAN                   
    -----------------------------------------------
    Insert on part_table
    ->  Gather
            Workers Planned: 2
            ->  Parallel Seq Scan on source_table
    (4 rows) 
    

Disable parallel execution parameters

To disable the parameters, follow these steps:

  1. In the Google Cloud console, go to the Clusterspage.

    Go to Clusters

  2. Select a cluster from the list.

  3. In the navigation menu, click AlloyDB Studio.

  4. Sign into AlloyDB Studio using the name of your database, username, and password.

  5. To disable the enable_parallel_select_for_insert_select parameter, run the following SQL command:

      SET 
      
     enable_parallel_select_for_insert_select 
      
     = 
      
     OFF 
     ; 
     
    
  6. To disable the enable_parallel_select_for_insert_select_into_part_table parameter, run the following SQL command:

      SET 
      
     enable_parallel_select_for_insert_select_into_part_table 
      
     = 
      
     OFF 
     ; 
     
    

Verify a parallel plan

You can verify that the optimizer is using a parallel plan by using the EXPLAIN command. Look for a Gather node and the Workers Planned or Workers Launched attributes in the query plan.

Standard table example

In the following plan example, six worker processes are launched for a parallel hash join to execute the SELECT statement.

  EXPLAIN 
  
 ( 
 ANALYZE 
 ) 
 INSERT 
  
 INTO 
  
 t1 
  
 SELECT 
  
 t2 
 . 
 * 
  
 FROM 
  
 t1 
 , 
  
 t2 
  
 WHERE 
  
 t1 
 . 
 c1 
  
 != 
  
 t2 
 . 
 c1 
  
 AND 
  
 t1 
 . 
 c2 
  
 = 
  
 t2 
 . 
 c2 
 ; 
  
 QUERY 
  
 PLAN 
 
Insert on t1 ( cost = 1209138 . 00 .. 12801765 . 49 rows = 0 width = 0 ) ( actual time = 16812 . 677 .. 19337 . 150 rows = 0 loops = 1 ) - > Gather ( cost = 1209138 . 00 .. 12801765 . 49 rows = 99995407 width = 24 ) ( actual time = 16812 . 674 .. 19337 . 144 rows = 0 loops = 1 ) Workers Planned : 6 Workers Launched : 6 - > Parallel Hash Join (...) (...)

Partitioned table example

The following example shows that, after you enable the feature for partitioned tables, the plan shows a Gather node with four planned workers for the SELECT .

  -- First, enable the feature for partitioned tables 
 SET 
  
 enable_parallel_select_for_insert_select_into_part_table 
  
 = 
  
 ON 
 ; 
 -- Then, explain the query 
 EXPLAIN 
  
 ( 
 COSTS 
  
 OFF 
 ) 
  
 INSERT 
  
 INTO 
  
 part_table 
  
 SELECT 
  
 * 
  
 FROM 
  
 source_table 
 ; 
  
 QUERY 
  
 PLAN 
  
 
Insert on part_table - > Gather Workers Planned : 2 - > Parallel Seq Scan on source_table ( 4 rows )

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: