Graph query best practices
This document describes best practices for optimizing your BigQuery Graph queries.
Start path traversal from low-cardinality nodes
To keep intermediate result sets small and speed up query execution, write your
graph queries so that path traversal starts from lower cardinality nodes,
regardless of the direction of path traversal. The following MATCH
statements use property filters to reduce the number of possible
starting nodes instead of computing all matches and then filtering:
MATCH
(
p
:
Person
{
id
:
10
}
)
-[
own
:
Owns
]-
> (
a
:
Account
)
MATCH
(
a
:
Account
WHERE
balance
>
10
)
< -[
own
:
Owns
]-
(
p
:
Person
)
This is especially important for quantified path queries:
MATCH
(
p
:
Person
{
id
:
10
}
)
-[
own
:
Owns
]->{
1
,
3
}
(
a
:
Account
)
Use ANY
or ANY SHORTEST
syntax for connectivity checks
Quantified path queries can return duplicate paths between source nodes and
destination nodes. If your goal is to check for connectivity and you don't
require all possible paths, use ANY
or ANY SHORTEST
to reduce redundant
computations and improve path lookup efficiency. For example, the following MATCH
statement uses ANY SHORTEST
to keep only one path between each pair
of nodes:
MATCH
ANY
SHORTEST
(
a1
:
Account
)
-[
t
:
Transfers
]->{
1
,
3
}
(
a2
:
Account
)
Use directional path traversal
BigQuery Graph schemas are directional, which means that each
edge has a source node and a destination node. Although graph query
syntax allows path traversal in any direction (for example, -[edge]-
), we
recommend using directional path traversal (for example, -[edge]->
or <-[edge]-
) for better performance. Any direction path traversal might cause
performance loss.
The following MATCH
statement uses any direction path traversal:
-- Avoid.
MATCH
(
a1
:
Account
{
id
:
7
}
)
-[
t
:
Transfers
]-
(
a2
:
Account
)
Instead, combine two directional traversals with UNION ALL
:
MATCH
(
a1
:
Account
{
id
:
7
}
)
-[
t
:
Transfers
]-
> (
a2
:
Account
)
...
UNION
ALL
...
MATCH
(
a1
:
Account
{
id
:
7
}
)
< -[
t
:
Transfers
]-
(
a2
:
Account
)
Specify labels explicitly
If node or edge labels are omitted in a query, BigQuery Graph enumerates all qualifying node and edge labels. This enumeration might cause more labels to be scanned than necessary. To avoid this, specify labels for all nodes and edges in your query whenever possible.
For example, the following query specifies the Account
and Transfers
labels:
GRAPH
graph_db
.
FinGraph
MATCH
(
a1
:
Account
)
-[
t
:
Transfers
]-
> (
a2
:
Account
)
RETURN
COUNT
(
*
)
AS
num_transfers
;
Avoid omitting labels, because it might scan other unneeded relationships
between nodes. In the following query, a1
can represent an account or a
person, and t
can represent a transfer or an account ownership.
GRAPH
graph_db
.
FinGraph
MATCH
(
a1
)
-[
t
]-
> (
a2
)
RETURN
COUNT
(
*
)
AS
num_transfers
;
Prefer a single MATCH
statement
BigQuery Graph lets you include multiple MATCH
statements in
a single graph query. These statements are connected by multi-declared variables
that represent the same node or edge. However, using multiple MATCH
statements can diminish cardinality benefits across statements. When
possible, use a single MATCH
statement for better performance.
For example, the following queries are equivalent, but the first one performs
better because it uses a single MATCH
statement:
-- Preferred syntax.
GRAPH
graph_db
.
FinGraph
MATCH
(
p
:
Person
{
id
:
1
}
)
-[
o
:
Owns
]-
>
(
a
:
Account
)
-[
t
:
Transfers
]-
> (
a2
:
Account
)
RETURN
o
.
account_id
,
t
.
amount
;
-- Avoid this syntax.
GRAPH
graph_db
.
FinGraph
MATCH
(
p
:
Person
{
id
:
1
}
)
-[
o
:
Owns
]-
> (
a
:
Account
)
MATCH
(
a
:
Account
)
-[
t
:
Transfers
]-
> (
a2
:
Account
)
RETURN
o
.
account_id
,
t
.
amount
;
Limit traversed edges from high-cardinality nodes
When you query graphs, some nodes can have a significantly larger number of incoming or outgoing edges compared to other nodes. These high-cardinality nodes are sometimes called super nodes or hub nodes . Super nodes can cause performance issues because traversals through them might involve processing large amounts of data, which leads to data skew and long execution times.
To optimize a query of a graph with super nodes, use the ROW_NUMBER()
function
within a FILTER
clause or WHERE
clause in MATCH
to limit the
number of edges that the query traverses from or to a node. This technique is
particularly useful when you don't need a complete enumeration of all
connections from or to a super node.
For example, if some accounts in FinGraph
have a large number of
transactions, you can use ROW_NUMBER()
to limit the number of Transfers
edges to consider for each Account
and avoid an inefficient query:
GRAPH
graph_db
.
FinGraph
MATCH
(
a1
:
Account
)
-[
e1
:
Transfers
WHERE
e1
IN
{
GRAPH
graph_db
.
FinGraph
-- Sample 5 edges per source node
MATCH
-[
selected_e
:
Transfers
]-
>
FILTER
ROW_NUMBER
()
OVER
(
PARTITION
BY
SOURCE_NODE_ID
(
selected_e
))
<
5
RETURN
selected_e
}]->{
1
,
3
}
(
a2
:
Account
)
RETURN
COUNT
(
*
)
AS
cnt
;
Sample intermediate nodes or edges in multi-hop queries
You can also improve query efficiency by using ROW_NUMBER()
to sample
intermediate nodes in multi-hop queries. This technique improves efficiency by
limiting the number of paths that the query considers for each intermediate
node. To do this, break a multi-hop query into multiple MATCH
statements
separated by NEXT
, and apply ROW_NUMBER()
at the midpoint where you need to
sample:
GRAPH
graph_db
.
FinGraph
MATCH
(
a1
:
Account
)
-[
e1
:
Transfers
]-
> (
a2
:
Account
)
-- Sample 5 destination nodes per a1 source node
FILTER
ROW_NUMBER
()
OVER
(
PARTITION
BY
ELEMENT_ID
(
a1
))
<
5
RETURN
a1
,
a2
NEXT
MATCH
(
a2
)
-[
e2
:
Transfers
]-
> (
a3
:
Account
)
RETURN
a1
.
id
AS
src_id
,
a2
.
id
AS
mid_id
,
a3
.
id
AS
dst_id
;
What's next
- To learn more about writing graph queries, see the query overview .
- Learn more about schema best practices .

