A binary operator has two relational children. The following operators are binary operators:
Database schema
The queries and execution plans on this page are based on the following database schema:
CREATE
TABLE
Singers
(
SingerId
INT64
NOT
NULL
,
FirstName
STRING
(
1024
),
LastName
STRING
(
1024
),
SingerInfo
BYTES
(
MAX
),
BirthDate
DATE
)
PRIMARY
KEY
(
SingerId
);
CREATE
INDEX
SingersByFirstLastName
ON
Singers
(
FirstName
,
LastName
);
CREATE
TABLE
Albums
(
SingerId
INT64
NOT
NULL
,
AlbumId
INT64
NOT
NULL
,
AlbumTitle
STRING
(
MAX
),
MarketingBudget
INT64
)
PRIMARY
KEY
(
SingerId
,
AlbumId
),
INTERLEAVE
IN
PARENT
Singers
ON
DELETE
CASCADE
;
CREATE
INDEX
AlbumsByAlbumTitle
ON
Albums
(
AlbumTitle
);
CREATE
INDEX
AlbumsByAlbumTitle2
ON
Albums
(
AlbumTitle
)
STORING
(
MarketingBudget
);
CREATE
TABLE
Songs
(
SingerId
INT64
NOT
NULL
,
AlbumId
INT64
NOT
NULL
,
TrackId
INT64
NOT
NULL
,
SongName
STRING
(
MAX
),
Duration
INT64
,
SongGenre
STRING
(
25
)
)
PRIMARY
KEY
(
SingerId
,
AlbumId
,
TrackId
),
INTERLEAVE
IN
PARENT
Albums
ON
DELETE
CASCADE
;
CREATE
INDEX
SongsBySingerAlbumSongNameDesc
ON
Songs
(
SingerId
,
AlbumId
,
SongName
DESC
),
INTERLEAVE
IN
Albums
;
CREATE
INDEX
SongsBySongName
ON
Songs
(
SongName
);
CREATE
TABLE
Concerts
(
VenueId
INT64
NOT
NULL
,
SingerId
INT64
NOT
NULL
,
ConcertDate
DATE
NOT
NULL
,
BeginTime
TIMESTAMP
,
EndTime
TIMESTAMP
,
TicketPrices
ARRAY<INT64>
)
PRIMARY
KEY
(
VenueId
,
SingerId
,
ConcertDate
);
You can use the following Data Manipulation Language (DML) statements to add data to these tables:
INSERT
INTO
Singers
(
SingerId
,
FirstName
,
LastName
,
BirthDate
)
VALUES
(
1
,
"Marc"
,
"Richards"
,
"1970-09-03"
),
(
2
,
"Catalina"
,
"Smith"
,
"1990-08-17"
),
(
3
,
"Alice"
,
"Trentor"
,
"1991-10-02"
),
(
4
,
"Lea"
,
"Martin"
,
"1991-11-09"
),
(
5
,
"David"
,
"Lomond"
,
"1977-01-29"
);
INSERT
INTO
Albums
(
SingerId
,
AlbumId
,
AlbumTitle
)
VALUES
(
1
,
1
,
"Total Junk"
),
(
1
,
2
,
"Go, Go, Go"
),
(
2
,
1
,
"Green"
),
(
2
,
2
,
"Forever Hold Your Peace"
),
(
2
,
3
,
"Terrified"
),
(
3
,
1
,
"Nothing To Do With Me"
),
(
4
,
1
,
"Play"
);
INSERT
INTO
Songs
(
SingerId
,
AlbumId
,
TrackId
,
SongName
,
Duration
,
SongGenre
)
VALUES
(
2
,
1
,
1
,
"Let's Get Back Together"
,
182
,
"COUNTRY"
),
(
2
,
1
,
2
,
"Starting Again"
,
156
,
"ROCK"
),
(
2
,
1
,
3
,
"I Knew You Were Magic"
,
294
,
"BLUES"
),
(
2
,
1
,
4
,
"42"
,
185
,
"CLASSICAL"
),
(
2
,
1
,
5
,
"Blue"
,
238
,
"BLUES"
),
(
2
,
1
,
6
,
"Nothing Is The Same"
,
303
,
"BLUES"
),
(
2
,
1
,
7
,
"The Second Time"
,
255
,
"ROCK"
),
(
2
,
3
,
1
,
"Fight Story"
,
194
,
"ROCK"
),
(
3
,
1
,
1
,
"Not About The Guitar"
,
278
,
"BLUES"
);
Apply join
An apply join is the primary join operator used by Spanner. Apply join operators execute row-oriented processing, unlike operators that execute set-based processing such as hash join . The apply operator has two inputs, input (left child) and map (right child). The apply operator applies each row on the input side to the map side using an apply method: cross , outer , semi , or anti-semi . Additionally, a variant of an apply join also appears on the map side of a Distributed apply .
The Apply join operator is most efficient when:
- The cardinality of the input is low.
- The join key is a prefix of the map-side primary key.
- The query joins two interleaved tables.
Properties and execution statistics
A property of an operator describes a trait that is used when the operator is executed. An execution statistic is a value collected during query execution to help you assess performance of the operator.
Properties
| Name | Description |
|---|---|
| Execution method | In Row execution, the operator processes one row at a time. In Batch execution, the operator processes a batch of rows at once. |
Execution statistics
| Name | Description |
|---|---|
| Latency | Elapsed time of all the executions done in the operator. |
| Cumulative latency | The total time of the current operator and its descendants. |
| CPU time | Sum of CPU time spent executing the operator. |
| Cumulative CPU time | The total CPU time spent executing the operator and its descendants. |
| Execution time | The total amount of time taken to run the query and process results. |
| Rows returned | The number of rows output by this operator |
| Number of executions | The number of times the operator was executed. Some executions can run in parallel. |
Cross apply
A Cross apply performs an inner join where only matching rows are returned.
The following query demonstrates this operator:
The query requests the first name of each singer, along with the name of only one of the singer's songs.
SELECT
si
.
firstname
,
(
SELECT
so
.
songname
FROM
songs
AS
so
WHERE
so
.
singerid
=
si
.
singerid
LIMIT
1
)
FROM
singers
AS
si
;
/*-----------+--------------------------+
| FirstName | Unspecified |
+-----------+--------------------------+
| Alice | Not About The Guitar |
| Catalina | Let's Get Back Together |
| David | NULL |
| Lea | NULL |
| Marc | NULL |
+-----------+--------------------------*/
The query populates the first column from the Singers
table, and the second column
from the Songs
table. In cases where a SingerId
existed in the Singers
table but there was no matching SingerId
in the Songs
table, the
second column contains NULL
.
The execution plan begins as follows:

The top-level node is a distributed union operator. The distributed union operator distributes sub plans to remote servers. The subplan contains a serialize result operator that computes the singer's first name and the name of one of the singer's songs and serializes each row of the output.
The serialize result operator receives its input from a cross apply operator.
The input side for the cross apply operator is a table scan
on the Singers
table.
The execution plan continues as follows:

The map side for the cross apply operation contains the following (from top to bottom):
- An aggregate
operator that
returns
Songs.SongName. - A limit operator that limits the number of songs returned to one per singer.
- An index scan
on the
SongsBySingerAlbumSongNameDescindex.
The cross apply operator maps each row from the input side to a row in the map
side that has the same SingerId
. The cross apply operator output is the FirstName
value from the input row, and the SongName
value from the map row.
(The SongName
value is NULL
if there is no map row that matches on SingerId
.) The distributed union operator at the top of the execution plan
then combines all of the output rows from the remote servers and returns them as
the query results.
Outer apply
An outer apply provides left outer join semantics. It ensures that each execution on the map side returns at least one row by adding NULL-padding if needed.
Semi apply
The semi apply operator returns input columns only when a match occurs on the map side.
The following query uses a semi join to find which singers do have an Album:
SELECT
FirstName
,
LastName
FROM
Singers
WHERE
SingerId
IN
(
SELECT
SingerId
FROM
Albums
);
/*-----------+----------+
| FirstName | LastName |
+-----------+----------+
| Marc | Richards |
| Catalina | Smith |
| Alice | Trentor |
| Lea | Martin |
+-----------+----------*/
The plan segment appears as follows:

Anti-semi apply
An Anti-semi apply operator is similar to a semi apply operator, except that it returns the input table columns only when a match doesn't occur on the map side.
The following query uses an anti-semi join to find which singers don't have an Album:
SELECT
FirstName
,
LastName
FROM
Singers
WHERE
SingerId
NOT
IN
(
SELECT
SingerId
FROM
Albums
);
/*-----------+----------+
| FirstName | LastName |
+-----------+----------+
| David | Lomond |
+-----------+----------*/
The plan segment appears as follows:

Hash join
A hash join operator is a hash-based implementation of SQL joins. Hash joins execute set-based processing. The hash join operator reads rows from input marked as build (left child) and inserts them into a hash table based on a join condition. The hash join operator then reads rows from input marked as probe (right child). For each row it reads from the probe input, the hash join operator looks for matching rows in the hash table. The hash join operator returns the matching rows as its result.
Hash join has the following advantages:
- It doesn't require the inputs to be sorted
- It computes a bloom filter when building the hash table. The operator uses the filter to exclude rows from the probe side that have no matches. Note that this is a residual filter, not a seek filter.
The following query demonstrates this operator:
SELECT
a
.
albumtitle
,
s
.
songname
FROM
albums
AS
a
join
@{
join_method
=
hash_join
}
songs
AS
s
ON
a
.
singerid
=
s
.
singerid
AND
a
.
albumid
=
s
.
albumid
;
/*-----------------------+--------------------------+
| AlbumTitle | SongName |
+-----------------------+--------------------------+
| Nothing To Do With Me | Not About The Guitar |
| Green | The Second Time |
| Green | Starting Again |
| Green | Nothing Is The Same |
| Green | Let's Get Back Together |
| Green | I Knew You Were Magic |
| Green | Blue |
| Green | 42 |
| Terrified | Fight Story |
+-----------------------+--------------------------*/
The execution plan segment appears as follows:

In the execution plan, build
is a distributed union
that
distributes scans
on the table Albums
. Probe
is a distributed union
operator that distributes scans on the index SongsBySingerAlbumSongNameDesc
.
The hash join operator reads all rows from the build side. Each build row is
placed in a hash table based on the columns in the condition a.SingerId =
s.SingerId AND a.AlbumId = s.AlbumId
. Next, the hash join operator reads all
rows from the probe side. For each probe row, the hash join operator looks for
matches in the hash table. The resulting matches are returned by the hash join
operator.
Resulting matches in the hash table might also be filtered by a residual condition before they're returned. (An example of where residual conditions appear is in non-equality joins). Hash join execution plans can be complex due to memory management and join variants. The main hash join algorithm is adapted to handle inner, semi, anti, and outer join variants.
Properties and execution statistics
A property of an operator describes a trait that is used when the operator is executed. An execution statistic is a value collected during query execution to help you assess performance of the operator.
Properties
| Name | Description |
|---|---|
| Execution method | In Row execution, the operator processes one row at a time. In Batch execution, the operator processes a batch of rows at once. |
Execution statistics
| Name | Description |
|---|---|
| Latency | Elapsed time of all the executions done in the operator. |
| Cumulative latency | The total time of the current operator and its descendants. |
| CPU time | Sum of CPU time spent executing the operator. |
| Cumulative CPU time | The total CPU time spent executing the operator and its descendants. |
| Execution time | The total amount of time taken to run the query and process results. |
| Rows returned | The number of rows output by this operator |
| Number of executions | The number of times the operator was executed. Some executions can run in parallel. |
Merge join
A merge join
operator is a merge-based implementation of SQL join. Both sides
of the join produce rows ordered by the columns used in the join condition. The
merge join consumes both input streams concurrently and outputs rows when the
join condition is satisfied. If inputs are not sorted, the optimizer adds
explicit Sort
operators to the plan.
Merge join has the following advantages:
- If the data is already sorted, it doesn't need any memory.
- Even if the data is not sorted, for a distributed join, it can perform the sort on each individual split, rather than creating a large hash table on the root.
Merge join
isn't selected automatically by the optimizer. To use this
operator, set the join method to MERGE_JOIN
on the query hint, as shown
in the following example:
SELECT
a
.
albumtitle
,
s
.
songname
FROM
albums
AS
a
join
@{
join_method
=
merge_join
}
songs
AS
s
ON
a
.
singerid
=
s
.
singerid
AND
a
.
albumid
=
s
.
albumid
;
/*-----------------------+--------------------------+
| AlbumTitle | SongName |
+-----------------------+--------------------------+
| Green | The Second Time |
| Green | Starting Again |
| Green | Nothing Is The Same |
| Green | Let's Get Back Together |
| Green | I Knew You Were Magic |
| Green | Blue |
| Green | 42 |
| Terrified | Fight Story |
| Nothing To Do With Me | Not About The Guitar |
+-----------------------+--------------------------*/
The execution plan appears as follows:

In this execution plan, the merge join is distributed so that the join executes
where the data resides. This also lets the merge join in this example operate
without additional sort operators, because both table scans are already sorted
by SingerId
, AlbumId
, which is the join condition. In this plan, the left
scan of the Albums
table advances whenever its SingerId
, AlbumId
is less
than the right scan's SingerId_1
, AlbumId_1
values. Similarly, the right
scan advances whenever its values are less than the left scan's values. This
merge advance continues searching for equivalences to return matching rows.
Consider another merge join example using the following query:
SELECT
a
.
albumtitle
,
s
.
songname
FROM
albums
AS
a
join
@{
join_method
=
merge_join
}
songs
AS
s
ON
a
.
albumid
=
s
.
albumid
;
/*-----------------------+--------------------------+
| AlbumTitle | SongName |
+-----------------------+--------------------------+
| Total Junk | The Second Time |
| Total Junk | Starting Again |
| Total Junk | Nothing Is The Same |
| Total Junk | Let's Get Back Together |
| Total Junk | I Knew You Were Magic |
| Total Junk | Blue |
| Total Junk | 42 |
| Total Junk | Not About The Guitar |
| Green | The Second Time |
| Green | Starting Again |
| Green | Nothing Is The Same |
| Green | Let's Get Back Together |
| Green | I Knew You Were Magic |
| Green | Blue |
| Green | 42 |
| Green | Not About The Guitar |
| Nothing To Do With Me | The Second Time |
| Nothing To Do With Me | Starting Again |
| Nothing To Do With Me | Nothing Is The Same |
| Nothing To Do With Me | Let's Get Back Together |
| Nothing To Do With Me | I Knew You Were Magic |
| Nothing To Do With Me | Blue |
| Nothing To Do With Me | 42 |
| Nothing To Do With Me | Not About The Guitar |
| Play | The Second Time |
| Play | Starting Again |
| Play | Nothing Is The Same |
| Play | Let's Get Back Together |
| Play | I Knew You Were Magic |
| Play | Blue |
| Play | 42 |
| Play | Not About The Guitar |
| Terrified | Fight Story |
+-----------------------+--------------------------*/
The execution plan appears as follows:

In the preceding execution plan, the query optimizer introduced additional sort
operators to execute the merge join. The JOIN
condition in this example query
is only on AlbumId
, which isn't how the data is stored, so a sort must be
added. The query engine supports a Distributed Merge algorithm, which lets the
sort occur locally instead of globally, distributing and parallelizing the CPU
cost.
The resulting matches might also be filtered by a residual condition. For example, residual conditions appear in non-equality joins. Merge join execution plans can be complex due to additional sort requirements. The main merge join algorithm handles inner, semi, anti, and outer join variants.
Properties and execution statistics
A property of an operator describes a trait that is used when the operator is executed. An execution statistic is a value collected during query execution to help you assess performance of the operator.
Properties
| Name | Description |
|---|---|
| Execution method | In Row execution, the operator processes one row at a time. In Batch execution, the operator processes a batch of rows at once. |
Execution statistics
| Name | Description |
|---|---|
| Latency | Elapsed time of all the executions done in the operator. |
| Cumulative latency | The total time of the current operator and its descendants. |
| CPU time | Sum of CPU time spent executing the operator. |
| Cumulative CPU time | The total CPU time spent executing the operator and its descendants. |
| Execution time | The total amount of time taken to run the query and process results. |
| Rows returned | The number of rows output by this operator |
| Number of executions | The number of times the operator was executed. Some executions can run in parallel. |
Recursive union
A recursive union
operator performs a union of two inputs, one that represents
a base
case, and the other that represents a recursive
case. It's used in
graph queries with quantified path traversals. The base input is processed first
and exactly once. The recursive input is processed until the recursion
terminates. The recursion terminates when the upper bound, if specified, is
reached, or when the recursion doesn't produce any new results. In the following
example, the Collaborations
table is added to the schema, and a property graph
called MusicGraph
is created.
CREATE
TABLE
Collaborations
(
SingerId
INT64
NOT
NULL
,
FeaturingSingerId
INT64
NOT
NULL
,
AlbumTitle
STRING
(
MAX
)
NOT
NULL
,
)
PRIMARY
KEY
(
SingerId
,
FeaturingSingerId
,
AlbumTitle
);
CREATE
OR
REPLACE
PROPERTY
GRAPH
MusicGraph
NODE
TABLES
(
Singers
KEY
(
SingerId
)
LABEL
Singers
PROPERTIES
(
BirthDate
,
FirstName
,
LastName
,
SingerId
,
SingerInfo
)
)
EDGE
TABLES
(
Collaborations
AS
CollabWith
KEY
(
SingerId
,
FeaturingSingerId
,
AlbumTitle
)
SOURCE
KEY
(
SingerId
)
REFERENCES
Singers
(
SingerId
)
DESTINATION
KEY
(
FeaturingSingerId
)
REFERENCES
Singers
(
SingerId
)
LABEL
CollabWith
PROPERTIES
(
AlbumTitle
,
FeaturingSingerId
,
SingerId
),
);
The following graph query finds singers who have collaborated with a given singer or collaborated with those collaborators.
GRAPH
MusicGraph
MATCH
(
singer
:
Singers
{
singerId
:
42
}
)
-[
c
:
CollabWith
]->{
1
,
2
}
(
featured
:
Singers
)
RETURN
singer
.
SingerId
AS
singer
,
featured
.
SingerId
AS
featured

The recursive union
operator filters the Singers
table to find the singer
with the given SingerId
. This is the base input to the recursive union
. The
recursive input to the recursive union
comprises a distributed cross
apply
or
other join operator for other queries that repeatedly joins the Collaborations
table with the results of the previous
iteration of the join. The rows from the base input form the zeroth iteration.
At each iteration, the output of the iteration is stored by the recursive spool
scan
. Rows from the recursive spool scan
are joined with the Collaborations
table on spoolscan.featuredSingerId = Collaborations.SingerId
. Recursion
terminates when two iterations are complete, since that's the specified upper
bound in the query.
Properties and execution statistics
A property of an operator describes a trait that is used when the operator is executed. An execution statistic is a value collected during query execution to help you assess performance of the operator.
Properties
| Name | Description |
|---|---|
| Execution method | In Row execution, the operator processes one row at a time. In Batch execution, the operator processes a batch of rows at once. |
Execution statistics
| Name | Description |
|---|---|
| Latency | Elapsed time of all the executions done in the operator. |
| Cumulative latency | The total time of the current operator and its descendants. |
| CPU time | Sum of CPU time spent executing the operator. |
| Cumulative CPU time | The total CPU time spent executing the operator and its descendants. |
| Execution time | The total amount of time taken to run the query and process results. |
| Rows returned | The number of rows output by this operator |
| Number of executions | The number of times the operator was executed. Some executions can run in parallel. |

