This document explains how to run algorithms on Spanner Graph.
Spanner Graph algorithm query structure
A Spanner Graph algorithm query has the following structure:
EXPORT
DATA
OPTIONS
(
< export_option_list
> )
AS
GRAPH
graph_name
< match_clause
>
< call_statement
>
algorithm_name
(
< common_input
> ,
< algorithm_specific_input
> )
YIELD
< algorithm_specific_output
> RETURN
< results
>
-
<export_option_list>: Options that define how to persist algorithm query results. See Cloud Storage Options and Spanner Options . -
graph_name: The name of the graph. -
<match_clause>: Optional MATCH statements to define algorithm input elements. -
<call_statement>: UseCALLwhen you omit<match_clause>and want to operate on the whole graph. UseCALL PER()when<match_clause>is present and you want to operate on the working table . For more information, see GQL CALL . -
algorithm_name: The name of the algorithm to run. For available algorithms, see Spanner Graph algorithms . -
<common_input>: Named input parameters common to all algorithm queries. For more information, see common algorithm input parameters . -
<algorithm_specific_input>: Named input parameters for the algorithm. For more information, see input parameters defined in Spanner Graph algorithms . -
<algorithm_specific_output>: The output of the algorithm call. For more information, see the outputs defined in Spanner Graph algorithms andYIELDin CALL statement. -
<results>: Defines what to return in query results.
The query is composed of an EXPORT DATA
statement, which defines how to persist results, and a GRAPH CLAUSE
that produces the algorithm query result.
In its simplest form, the graph clause identifies the graph, CALL
s an algorithm that yields predefined output, and then specifies what to RETURN
from the algorithm's output.
Optionally, the graph clause can use supported MATCH
statements to select elements of interest. In this case, use a PER ()
clause to group all rows returned by MATCH
as input to the algorithm. The algorithm operates on a logical subgraph composed of the unique set of nodes and edges selected.
The query doesn't return any data. Results are persisted according to export_option_list.
For more information about Spanner Graph algorithm queries, see the following sections in this document:
Common algorithm input parameters
Specify these named input parameters in the following format: NAME => VALUE, ...
.
| Name | Value Type | Required | Default Value | Description |
|---|---|---|---|---|
node_labels
|
ARRAY
|
No | (none) | Only supported when CALL
is used. A list of node labels to include in
the algorithm input. If specified, only nodes with at least one matching
label are included. |
edge_labels
|
ARRAY
|
No | (none) | Only supported when CALL
is used. A list of edge labels to include in
the algorithm input. If specified, only edges with at least one matching
label are included. |
edge_weight_property
|
STRING
|
No | (none) | The name of the edge property that contains the weights. If undefined, the system assigns a default weight of 1 to all edges. The property value type must be numeric. |
machine_category
|
STRING
|
No | default | The machine category to use for the algorithm execution. Supported values are: default
, large
|
zone
|
STRING
|
No | (none) | The zone where the algorithm execution takes place. Must be one of the zones in the region the query is received in. |
max_idle_time
|
STRING
|
No | 30m | Specifies how long the compute instance should remain active for
reuse after the algorithm completes. Format is a sequence of decimal
numbers, each with a unit suffix, such as 4m
, 1.5h
or 1h45m
. Valid time units are ns
, us
(or µs
), ms
, s
, m
, h
. |
Handle algorithm output
You must persist algorithm query results before you can inspect them. Use the export_data_option
to describe how to persist the results. You can persist the
results to Cloud Storage or back to the same Spanner instance the
query originated from.
Persist results to Cloud Storage
To use this option, make sure Storage Object Admin ( roles/storage.objectAdmin
)
role is granted to Google-managed Spanner service account
service- PROJECT_NUMBER
@gcp-sa-spanner.iam.gserviceaccount.com
.
The following EXPORT DATA
options are supported when persisting results to
Cloud Storage. Specify the options in the following format: NAME=VALUE, ...
.
uri
STRING
gs://bucket/path/file
format. If you export a large amount of data, use a wildcard in uri
to export data into multiple files. For example, gs://bucket/path/file_*.csv
.format
STRING
CSV
, PARQUET
, AVRO
.header
BOOL
true
, the system prints column headers for the first row of each data file. The default is false
. Applies only to CSV.overwrite
BOOL
true
, the system overwrites any existing files with the same URI. Otherwise, if files with the same URI exist, the statement returns an error. The default is false
.field_delimiter
STRING
,
(comma). Applies only to CSV.compression
STRING
- For
CSV, supported value isGZIP. - For
PARQUET, supported values are:SNAPPY,GZIP,ZSTD. - For
AVRO, supported values are:DEFLATE,SNAPPY.
The column names in RETURN
clause define the column names in Cloud Storage
output files.
Export data into one or more files
Spanner Graph queries support a single wildcard operator ( *
) in the uri
. The wildcard can appear in the filename component, but not in bucket name, folder name, or file extension. Using the wildcard operator instructs Spanner Graph to create multiple sharded files based on the pattern you supply if the result set is large. The system replaces the wildcard operator with a number, starting at zero, left-padded to 12 digits. For example, a URI gs://my-bucket/file-*.csv
creates files like gs://my-bucket/file-000000000000.csv
, gs://my-bucket/file-000000000001.csv
, and similar files.
If you use a uri
without wildcard, the result is a single file, like gs://my-bucket/file.csv
.
Data types
When you export data, Spanner graph data types convert as follows, depending on the format:
CSV
All data types convert to their string representation:
-
BOOLvalues convert totrueorfalse. -
BYTESvalues base64-encode. -
TIMESTAMPvalues format asYYYY-MM-DD HH:MM:SS.ffffff UTC. -
NULLvalues appear as empty strings.
You cannot export nested and repeated data in CSV format.
Avro
| Spanner data type | Avro data type |
|---|---|
BOOL
|
BOOLEAN
|
INT64
|
LONG
|
FLOAT
|
FLOAT
|
DOUBLE
|
DOUBLE
|
NUMERIC
|
BYTES
with logical type DECIMAL(38,9)
|
STRING
|
STRING
|
BYTES
|
BYTES
|
TIMESTAMP
|
LONG
(microseconds since epoch) |
NULL
|
null
|
Parquet
| Spanner data type | Parquet data type |
|---|---|
BOOL
|
BOOLEAN
|
INT64
|
INT64
|
FLOAT
|
FLOAT
|
DOUBLE
|
DOUBLE
|
NUMERIC
|
DECIMAL(38,9)
|
STRING
|
STRING
|
BYTES
|
BYTE_ARRAY
|
TIMESTAMP
|
TIMESTAMP_MICROS
|
NULL
|
null
|
Persist results to Spanner
The following EXPORT DATA
options are supported when persisting results back
to your source Spanner instance. Specify the options in the following
format: NAME=VALUE, ...
.
format
STRING
CLOUD_SPANNER
.table
STRING
write_mode
STRING
-
update_ignore_all: Updates existing rows in the destination table. -
upsert_ignore_all: Inserts new rows or updates existing rows in the destination table.
In both modes, Spanner skips any record that would introduce a constraint violation (for example, missing keys on an update, unique index violation, foreign key constraint violation). However, the write fails for non-constraint violation errors (for example, column type mismatch, missing values for NOT NULL columns).
Requirements
When you persist algorithm results back to Spanner, your algorithm query must satisfy the following:
- The destination table must exist.
- Columns must exist with a matching type: All column names specified in the
RETURNclause must already exist in the destination table with a matching data type. Use aliases to match the destination table column names if needed. Example:RETURN node.id AS person_id. - Include all primary key columns: The
RETURNclause must include all primary key columns of the destination table.
Write semantics
Persisting results back to Spanner is a non-transactional operation. It provides row-level atomicity. This means the system either successfully writes all columns from the same row or writes none of them. It follows at-least-once semantics. This means a row can be written to multiple times. Reading from the destination table while the execution is underway might yield incomplete results.
If the overall execution fails, the system does not roll back changes that have
already been committed. The write process fails on the first non-retryable
error. When a write failure occurs, the ERROR_MESSAGE
in GRAPH_OPERATION_EXECUTION_STATUS
indicates the primary key of the row that failed along with the specific
reason for the failure.
The system writes algorithm results back to Spanner Graph using MEDIUM
priority
.
Run example algorithm queries
This section shows example Spanner Graph algorithm queries you can run on a test instance. For a full list of algorithms Spanner Graph supports, see Spanner Graph algorithms .
Before you begin
To run the example Spanner Graph algorithm queries, you must first complete the following:
- Follow Set up and query Spanner Graph to create a Spanner Graph.
- Ensure you have the required permissions .
- Optional: Augment the Spanner Graph schema if you are persisting output to Spanner.
Add a new column named page_rank
to the Account
table. Spanner
writes algorithm results to this new column. Then, refresh the graph definition
so you can access page_rank
as a node property.
-- Add `page_rank` as a column. Data type of this column matches the data type defined in `PageRank` output signature.
ALTER
TABLE
Account
ADD
COLUMN
page_rank
FLOAT64
;
-- Rerun the graph definition DDL to pickup `page_rank` as a new property.
CREATE
OR
REPLACE
PROPERTY
GRAPH
FinGraph
NODE
TABLES
(
`Account`
,
`Person`
)
EDGE
TABLES
(
`PersonOwnAccount`
SOURCE
KEY
(
id
)
REFERENCES
`Person`
(
id
)
DESTINATION
KEY
(
account_id
)
REFERENCES
`Account`
(
id
)
LABEL
`Owns`
,
`AccountTransferAccount`
SOURCE
KEY
(
id
)
REFERENCES
`Account`
(
id
)
DESTINATION
KEY
(
to_id
)
REFERENCES
`Account`
(
id
)
LABEL
`Transfers`
);
Run algorithm on full graph with label filter and persist results to Cloud Storage
This example runs PageRank
to rank Account
s based on the Transactions
they
participate in and persists results to a Cloud Storage in CSV format as
" my-bucket-name/my-output.csv
"
EXPORT
DATA
OPTIONS
(
uri
=
"gs://my-bucket-name/my-output.csv"
,
format
=
"csv"
)
AS
GRAPH
FinGraph
CALL
PageRank
(
node_labels
=
>
[
'Account'
]
,
edge_labels
=
>
[
'Transfers'
]
)
YIELD
node
,
score
RETURN
node
.
id
,
score
AS
page_rank
In Cloud Storage, you should see a CSV file with two columns ( id
and page_rank
) when this query completes successfully.
Run algorithm on subgraph defined by MATCH
and persist results to graph
This example uses the MATCH
pattern to dynamically match a logical subgraph
containing all Account
nodes and only Transfer
edges with an amount less than 500.
This logical subgraph is the input to the PageRank
algorithm.
Spanner persists algorithm results back to the Account
table.
EXPORT
DATA
OPTIONS
(
format
=
"CLOUD_SPANNER"
,
table
=
"Account"
,
write_mode
=
'update_ignore_all'
)
AS
GRAPH
FinGraph
MATCH
(
n
:
Account
)
RETURN
n
FULL
UNION
ALL
MATCH
-[
e
:
Transfers
WHERE
e
.
amount
<
500
]-
> RETURN
e
NEXT
CALL
PER
()
PageRank
()
YIELD
node
,
score
RETURN
node
.
id
,
score
AS
page_rank
After the query completes successfully, run the following query:
GRAPH
FinGraph
MATCH
(
n
:
Account
)
RETURN
n
.
id
,
ROUND
(
n
.
page_rank
,
2
)
AS
page_rank
ORDER
BY
page_rank
DESC
,
id
ASC
You should see results similar to the following:
| id | page_rank |
|---|---|
| 20 | 0.49 |
| 16 | 0.46 |
| 7 | 0.05 |
Check algorithm execution status
When a graph algorithm query completes successfully, it returns zero rows and Success
status. Depending on the input graph size and specific algorithm
configurations, the algorithm execution might take a while to complete. You can
check the progress and execution status of a graph algorithm query in SPANNER_SYS.GRAPH_OPERATION_EXECUTION_STATUS
table. This table retains
information for 30 days.
GRAPH_OPERATION_EXECUTION_STATUS
schema
| Column name | Type | Description |
|---|---|---|
QUERY_ID
|
STRING
|
The ID for the graph algorithm query. |
QUERY_TEXT
|
STRING
|
The query statement text. |
START_TIMESTAMP
|
TIMESTAMP
|
The time at which the query started execution. |
LAST_UPDATE_TIMESTAMP
|
TIMESTAMP
|
The time at which the status was last updated. |
PROGRESS
|
FLOAT
|
Estimated percentage of completion. The value is between 0
and 1
, where 0
means started and 1
means completed. |
STATUS
|
STRING
|
Current state of execution. Possible values are PENDING
, IN_PROGRESS
, OK
, CANCELLED
, DEADLINE_EXCEEDED
, UNKNOWN
. |
ERROR_MESSAGE
|
STRING
|
Error message if the query execution failed. |
The following sample query lists graph queries that have not yet completed successfully:
SELECT
query_id
,
query_text
,
start_timestamp
,
last_update_timestamp
,
progress
,
status
,
error_message
FROM
SPANNER_SYS
.
GRAPH_OPERATION_EXECUTION_STATUS
WHERE
status
!=
"OK"
ORDER
BY
start_timestamp
DESC
;
Cancel algorithm execution
To cancel an in-flight graph algorithm query, locate the query_id
from SPANNER_SYS.GRAPH_OPERATION_EXECUTION_STATUS
table, then call cancel_query
for that query_id
.

