Distributed operators

Distributed operators execute across multiple servers unlike leaf, unary, binary, or n-ary operators.

The following operators are distributed operators:

Distributed union
Distributed apply
Distributed merge union
Push broadcast hash join

Database schema

The queries and execution plans on this page are based on the following database schema:

  CREATE 
  
 TABLE 
  
 Singers 
  
 ( 
  
 SingerId 
  
 INT64 
  
 NOT 
  
 NULL 
 , 
  
 FirstName 
  
 STRING 
 ( 
 1024 
 ), 
  
 LastName 
  
 STRING 
 ( 
 1024 
 ), 
  
 SingerInfo 
  
 BYTES 
 ( 
 MAX 
 ), 
  
 BirthDate 
  
 DATE 
 ) 
  
 PRIMARY 
  
 KEY 
 ( 
 SingerId 
 ); 
 CREATE 
  
 INDEX 
  
 SingersByFirstLastName 
  
 ON 
  
 Singers 
 ( 
 FirstName 
 , 
  
 LastName 
 ); 
 CREATE 
  
 TABLE 
  
 Albums 
  
 ( 
  
 SingerId 
  
 INT64 
  
 NOT 
  
 NULL 
 , 
  
 AlbumId 
  
 INT64 
  
 NOT 
  
 NULL 
 , 
  
 AlbumTitle 
  
 STRING 
 ( 
 MAX 
 ), 
  
 MarketingBudget 
  
 INT64 
 ) 
  
 PRIMARY 
  
 KEY 
 ( 
 SingerId 
 , 
  
 AlbumId 
 ), 
  
 INTERLEAVE 
  
 IN 
  
 PARENT 
  
 Singers 
  
 ON 
  
 DELETE 
  
 CASCADE 
 ; 
 CREATE 
  
 INDEX 
  
 AlbumsByAlbumTitle 
  
 ON 
  
 Albums 
 ( 
 AlbumTitle 
 ); 
 CREATE 
  
 INDEX 
  
 AlbumsByAlbumTitle2 
  
 ON 
  
 Albums 
 ( 
 AlbumTitle 
 ) 
  
 STORING 
  
 ( 
 MarketingBudget 
 ); 
 CREATE 
  
 TABLE 
  
 Songs 
  
 ( 
  
 SingerId 
  
 INT64 
  
 NOT 
  
 NULL 
 , 
  
 AlbumId 
  
 INT64 
  
 NOT 
  
 NULL 
 , 
  
 TrackId 
  
 INT64 
  
 NOT 
  
 NULL 
 , 
  
 SongName 
  
 STRING 
 ( 
 MAX 
 ), 
  
 Duration 
  
 INT64 
 , 
  
 SongGenre 
  
 STRING 
 ( 
 25 
 ) 
 ) 
  
 PRIMARY 
  
 KEY 
 ( 
 SingerId 
 , 
  
 AlbumId 
 , 
  
 TrackId 
 ), 
  
 INTERLEAVE 
  
 IN 
  
 PARENT 
  
 Albums 
  
 ON 
  
 DELETE 
  
 CASCADE 
 ; 
 CREATE 
  
 INDEX 
  
 SongsBySingerAlbumSongNameDesc 
  
 ON 
  
 Songs 
 ( 
 SingerId 
 , 
  
 AlbumId 
 , 
  
 SongName 
  
 DESC 
 ), 
  
 INTERLEAVE 
  
 IN 
  
 Albums 
 ; 
 CREATE 
  
 INDEX 
  
 SongsBySongName 
  
 ON 
  
 Songs 
 ( 
 SongName 
 ); 
 CREATE 
  
 TABLE 
  
 Concerts 
  
 ( 
  
 VenueId 
  
 INT64 
  
 NOT 
  
 NULL 
 , 
  
 SingerId 
  
 INT64 
  
 NOT 
  
 NULL 
 , 
  
 ConcertDate 
  
 DATE 
  
 NOT 
  
 NULL 
 , 
  
 BeginTime 
  
 TIMESTAMP 
 , 
  
 EndTime 
  
 TIMESTAMP 
 , 
  
 TicketPrices 
  
 ARRAY<INT64> 
 ) 
  
 PRIMARY 
  
 KEY 
 ( 
 VenueId 
 , 
  
 SingerId 
 , 
  
 ConcertDate 
 );

You can use the following Data Manipulation Language (DML) statements to add data to these tables:

  INSERT 
  
 INTO 
  
 Singers 
  
 ( 
 SingerId 
 , 
  
 FirstName 
 , 
  
 LastName 
 , 
  
 BirthDate 
 ) 
 VALUES 
  
 ( 
 1 
 , 
  
 "Marc" 
 , 
  
 "Richards" 
 , 
  
 "1970-09-03" 
 ), 
  
 ( 
 2 
 , 
  
 "Catalina" 
 , 
  
 "Smith" 
 , 
  
 "1990-08-17" 
 ), 
  
 ( 
 3 
 , 
  
 "Alice" 
 , 
  
 "Trentor" 
 , 
  
 "1991-10-02" 
 ), 
  
 ( 
 4 
 , 
  
 "Lea" 
 , 
  
 "Martin" 
 , 
  
 "1991-11-09" 
 ), 
  
 ( 
 5 
 , 
  
 "David" 
 , 
  
 "Lomond" 
 , 
  
 "1977-01-29" 
 ); 
 INSERT 
  
 INTO 
  
 Albums 
  
 ( 
 SingerId 
 , 
  
 AlbumId 
 , 
  
 AlbumTitle 
 ) 
 VALUES 
  
 ( 
 1 
 , 
  
 1 
 , 
  
 "Total Junk" 
 ), 
  
 ( 
 1 
 , 
  
 2 
 , 
  
 "Go, Go, Go" 
 ), 
  
 ( 
 2 
 , 
  
 1 
 , 
  
 "Green" 
 ), 
  
 ( 
 2 
 , 
  
 2 
 , 
  
 "Forever Hold Your Peace" 
 ), 
  
 ( 
 2 
 , 
  
 3 
 , 
  
 "Terrified" 
 ), 
  
 ( 
 3 
 , 
  
 1 
 , 
  
 "Nothing To Do With Me" 
 ), 
  
 ( 
 4 
 , 
  
 1 
 , 
  
 "Play" 
 ); 
 INSERT 
  
 INTO 
  
 Songs 
  
 ( 
 SingerId 
 , 
  
 AlbumId 
 , 
  
 TrackId 
 , 
  
 SongName 
 , 
  
 Duration 
 , 
  
 SongGenre 
 ) 
 VALUES 
  
 ( 
 2 
 , 
  
 1 
 , 
  
 1 
 , 
  
 "Let's Get Back Together" 
 , 
  
 182 
 , 
  
 "COUNTRY" 
 ), 
  
 ( 
 2 
 , 
  
 1 
 , 
  
 2 
 , 
  
 "Starting Again" 
 , 
  
 156 
 , 
  
 "ROCK" 
 ), 
  
 ( 
 2 
 , 
  
 1 
 , 
  
 3 
 , 
  
 "I Knew You Were Magic" 
 , 
  
 294 
 , 
  
 "BLUES" 
 ), 
  
 ( 
 2 
 , 
  
 1 
 , 
  
 4 
 , 
  
 "42" 
 , 
  
 185 
 , 
  
 "CLASSICAL" 
 ), 
  
 ( 
 2 
 , 
  
 1 
 , 
  
 5 
 , 
  
 "Blue" 
 , 
  
 238 
 , 
  
 "BLUES" 
 ), 
  
 ( 
 2 
 , 
  
 1 
 , 
  
 6 
 , 
  
 "Nothing Is The Same" 
 , 
  
 303 
 , 
  
 "BLUES" 
 ), 
  
 ( 
 2 
 , 
  
 1 
 , 
  
 7 
 , 
  
 "The Second Time" 
 , 
  
 255 
 , 
  
 "ROCK" 
 ), 
  
 ( 
 2 
 , 
  
 3 
 , 
  
 1 
 , 
  
 "Fight Story" 
 , 
  
 194 
 , 
  
 "ROCK" 
 ), 
  
 ( 
 3 
 , 
  
 1 
 , 
  
 1 
 , 
  
 "Not About The Guitar" 
 , 
  
 278 
 , 
  
 "BLUES" 
 );

The distributed union operator is the primitive operator from which distributed cross apply and distributed outer apply are derived.

Distributed operators appear in execution plans with a distributed unionvariant on top of one or more local distributed unionvariants. A distributed union variant performs the remote distribution of subplans.

A local distributed union variant is on top of each of the scans performed for the query. The local distributed union variants ensure stable query execution when restarts occur for dynamically changing split boundaries. Although this operator is hidden from the visual plan, it is always present.

Whenever possible, a distributed union variant uses a split predicate for split pruning . Split pruning means the remote servers execute subplans only on splits that satisfy the predicate, improving latency and query performance.

Distributed union

A distributed union operator conceptually divides one or more tables into multiple splits , remotely evaluates a subquery independently on each split, and then unions all results.

The following query demonstrates this operator:

  SELECT 
  
 s 
 . 
 songname 
 , 
  
 s 
 . 
 songgenre 
 FROM 
  
 songs 
  
 AS 
  
 s 
 WHERE 
  
 s 
 . 
 singerid 
  
 = 
  
 2 
  
 AND 
  
 s 
 . 
 songgenre 
  
 = 
  
 'ROCK' 
 ; 
 /*-----------------+-----------+ 
 | SongName        | SongGenre | 
 +-----------------+-----------+ 
 | Starting Again  | ROCK      | 
 | The Second Time | ROCK      | 
 | Fight Story     | ROCK      | 
 +-----------------+-----------*/

The execution plan appears as follows:

Distributed union operator execution plan

The distributed union operator sends subplans to remote servers, which perform a table scan across splits that satisfy the query's predicate WHERE s.SingerId = 2 AND s.SongGenre = 'ROCK' . A serialize result operator computes the SongName and SongGenre values from the rows returned by the table scans. The distributed union operator then returns the combined results from the remote servers as the SQL query results.

Properties and execution statistics

A property of an operator describes a trait that is used when the operator is executed. An execution statistic is a value collected during query execution to help you assess performance of the operator.

The Distributed unionoperator has additional distinct execution statistics.

Properties

Name	Description
Execution method	In Row execution, the operator processes one row at a time. In Batch execution, the operator processes a batch of rows at once.

Execution statistics

Name	Description
Local parallel executions	The number of subqueries executed in parallel.
Remote calls	The number of remote subqueries executed.
Latency	Elapsed time of all the executions done in the operator.
Cumulative latency	The total time of the current operator and its descendants.
CPU time	Sum of CPU time spent executing the operator.
Cumulative CPU time	The total CPU time spent executing the operator and its descendants.
Execution time	The total amount of time taken to run the query and process results.
Rows returned	The number of rows output by this operator
Number of executions	The number of times the operator was executed. Some executions can run in parallel.

Generally, executions are in parallel, unlike cross apply executions. Because of this, latency numbers on distributed operators are cumulative, unlike most operators, which report how much latency that operator added. The number of executions under a distributed union is based on the table's split boundaries, which in turn depend on data size and load, and potentially include the use_additional_parallelism statement hint. This approach to statistics applies to all distributed operators.

Distributed apply

A distributed apply (DA) operator extends the apply join operator by executing across multiple servers. The input side groups rows into batches (unlike a regular cross apply operator, which acts on only one input row at a time). The DA map side is a set of plain apply join operators that execute on remote servers. A distributed apply join supports the same apply methods as apply join .

Properties and execution statistics

The Distributed applyoperator has additional distinct execution statistics.

Properties

Name	Description
Execution method	In Row execution, the operator processes one row at a time. In Batch execution, the operator processes a batch of rows at once.

Execution statistics

Name	Description
Local parallel executions	The number of subqueries executed in parallel.
Remote calls	The number of remote subqueries executed.
Number of batches	A batch is a dynamic collection of rows that are processed at the same time. This shows the number of batches a distributed cross apply sent from the input to the map side.
Latency	Elapsed time of all the executions done in the operator.
Cumulative latency	The total time of the current operator and its descendants.
CPU time	Sum of CPU time spent executing the operator.
Cumulative CPU time	The total CPU time spent executing the operator and its descendants.
Execution time	The total amount of time taken to run the query and process results.
Rows returned	The number of rows output by this operator
Number of executions	The number of times the operator was executed. Some executions can run in parallel.

Distributed cross apply

The following query demonstrates this operator:

  SELECT 
  
 albumtitle 
 FROM 
  
 songs 
  
 JOIN 
  
 albums 
  
 ON 
  
 albums 
 . 
 albumid 
  
 = 
  
 songs 
 . 
 albumid 
 ; 
 /*-----------------------+ 
 | AlbumTitle            | 
 +-----------------------+ 
 | Green                 | 
 | Nothing To Do With Me | 
 | Play                  | 
 | Total Junk            | 
 | Green                 | 
 +-----------------------*/

The execution plan appears as follows:

Distributed cross apply operator execution plan

The DCA input contains an index scan on the SongsBySingerAlbumSongNameDesc index that batches rows of AlbumId . The map side for the DCA is a standard cross apply, where the input is a batch of rows, and the map side is an index scan on the index AlbumsByAlbumTitle , subject to the predicate of AlbumId in the input row matching the AlbumId key in the AlbumsByAlbumTitle index. The mapping returns the SongName for the SingerId values in the batched input rows.

To summarize the DCA process for this example, the DCA's input is the batched rows from the Albums table, and the DCA's output is the application of these rows to the map of the index scan.

Distributed outer apply

A Distributed outer apply is a DA with left outer join semantics. See outer apply for details on the semantics.

The following query demonstrates this operator:

  SELECT 
  
 lastname 
 , 
  
 concertdate 
 FROM 
  
 singers 
  
 LEFT 
  
 OUTER 
  
 join 
 @{ 
 JOIN_TYPE 
 = 
 APPLY_JOIN 
 } 
  
 concerts 
 ON 
  
 singers 
 . 
 singerid 
 = 
 concerts 
 . 
 singerid 
 ; 
 /*----------+-------------+ 
 | LastName | ConcertDate | 
 +----------+-------------+ 
 | Trentor  | 2014-02-18  | 
 | Smith    | 2011-09-03  | 
 | Smith    | 2010-06-06  | 
 | Lomond   | 2005-04-30  | 
 | Martin   | 2015-11-04  | 
 | Richards |             | 
 +----------+-------------*/

The execution plan appears as follows:

Distributed outer apply operator execution plan

Distributed semi apply

A Distributed semi apply is a DA with semi join semantics. See semi apply for details on the semantics.

Distributed anti-semi apply

A Distributed anti-semi apply is a DA with anti-semi join semantics. See anti-semi apply for details on the semantics.

Distributed merge union

The distributed merge union operator distributes a query across multiple remote servers. It then combines the query results to produce a sorted result, known as a distributed merge sort .

A distributed merge union executes the following steps:

The root server sends a subquery to each remote server that hosts a split of the queried data. The subquery includes instructions that results are sorted in a specific order.
Each remote server executes the subquery on its split, then sends the results back in the requested order.
The root server merges the sorted subquery to produce a completely sorted result.

Distributed merge union is enabled by default for Spanner Version 3 and later.

Properties and execution statistics

The Distributed applyoperator has additional distinct execution statistics.

Properties

Name	Description
Execution method	In Row execution, the operator processes one row at a time. In Batch execution, the operator processes a batch of rows at once.

Execution statistics

Name	Description
Local parallel executions	The number of subqueries executed in parallel.
Remote calls	The number of remote subqueries executed.
Number of batches	A batch is a dynamic collection of rows that are processed at the same time. This shows the number of batches a distributed cross apply sent from the input to the map side.
Latency	Elapsed time of all the executions done in the operator.
Cumulative latency	The total time of the current operator and its descendants.
CPU time	Sum of CPU time spent executing the operator.
Cumulative CPU time	The total CPU time spent executing the operator and its descendants.
Execution time	The total amount of time taken to run the query and process results.
Rows returned	The number of rows output by this operator
Number of executions	The number of times the operator was executed. Some executions can run in parallel.

Push broadcast hash join

A push broadcast hash join operator is a distributed hash-join-based implementation of SQL joins. The push broadcast hash join operator reads rows from the input side in order to construct a batch of data. The operator broadcasts that batch to all servers containing map side data. On the destination servers where the batch of data is received, the operator builds a hash join using the batch as the build side data and scans the local data as the probe side of the hash join.

Push broadcast hash join has the following advantages:

If the build table is small, it can be sent to all map side splits.
The map side table can be scanned, with or without residual filters. This occurs when the join keys are not the same as the map table's primary keys.

Push broadcast hash join isn't selected automatically by the optimizer. To use this operator, set the join method to PUSH_BROADCAST_HASH_JOIN on the query hint, as shown in the following example:

  SELECT 
  
 a 
 . 
 albumtitle 
 , 
  
 s 
 . 
 songname 
 FROM 
  
 albums 
  
 AS 
  
 a 
  
 join 
 @{ 
 join_method 
 = 
 push_broadcast_hash_join 
 } 
  
 songs 
  
 AS 
  
 s 
 ON 
  
 a 
 . 
 singerid 
  
 = 
  
 s 
 . 
 singerid 
 AND 
  
 a 
 . 
 albumid 
  
 = 
  
 s 
 . 
 albumid 
 ; 
 /*-----------------------+--------------------------+ 
 | AlbumTitle            | SongName                 | 
 +-----------------------+--------------------------+ 
 | Green                 | The Second Time          | 
 | Green                 | Starting Again           | 
 | Green                 | Nothing Is The Same      | 
 | Green                 | Let's Get Back Together  | 
 | Green                 | I Knew You Were Magic    | 
 | Green                 | Blue                     | 
 | Green                 | 42                       | 
 | Terrified             | Fight Story              | 
 | Nothing To Do With Me | Not About The Guitar     | 
 +-----------------------+--------------------------*/

The execution plan appears as follows:

Push broadcast hash join operator execution plan

The input to the Push broadcast hash join is the AlbumsByAlbumTitle index. The operator serializes that input into a batch of data. The operator sends that batch to all the local splits of the index SongsBySingerAlbumSongNameDesc , where the operator deserializes the batch and builds it into a hash table. The hash table then uses the local index data as a probe returning resulting matches.

Resulting matches might also be filtered by a residual condition before they're returned. (An example of where residual conditions appear is in non-equality joins).

Properties and execution statistics

The Distributed applyoperator has additional distinct execution statistics.

Properties

Name	Description
Execution method	In Row execution, the operator processes one row at a time. In Batch execution, the operator processes a batch of rows at once.

Execution statistics

Name	Description
Local parallel executions	The number of subqueries executed in parallel.
Remote calls	The number of remote subqueries executed.
Number of batches	A batch is a dynamic collection of rows that are processed at the same time. This shows the number of batches a distributed cross apply sent from the input to the map side.
Latency	Elapsed time of all the executions done in the operator.
Cumulative latency	The total time of the current operator and its descendants.
CPU time	Sum of CPU time spent executing the operator.
Cumulative CPU time	The total CPU time spent executing the operator and its descendants.
Execution time	The total amount of time taken to run the query and process results.
Rows returned	The number of rows output by this operator
Number of executions	The number of times the operator was executed. Some executions can run in parallel.

Distributed operators Stay organized with collections Save and categorize content based on your preferences.

Database schema

Distributed union

Properties and execution statistics

Distributed apply

Properties and execution statistics

Distributed cross apply

Distributed outer apply

Distributed semi apply

Distributed anti-semi apply

Distributed merge union

Properties and execution statistics

Push broadcast hash join

Properties and execution statistics

Distributed operators