This tutorial shows you how to perform semantic search on your graph data by
using autonomous embedding generation
and the AI.SEARCH
function
.
Objectives
This tutorial covers the following tasks:- Create tables that hold information about people, financial accounts, account ownership, and account transfers.
- Use autonomous embedding generation to simplify your embedding maintenance workflow.
- Create a graph that defines the relationships between data stored in your tables.
- Use the
AI.SEARCHfunction on your graph nodes to perform semantic search on account descriptions. - Use the
AI.SEARCHfunction on your graph edges to perform semantic search on account transfer notes.
Costs
In this document, you use the following billable components of Google Cloud:
- BigQuery : You incur costs for the data that you process in BigQuery.
To generate a cost estimate based on your projected usage, use the pricing calculator .
When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up .
Before you begin
Console
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
-
Verify that billing is enabled for your Google Cloud project .
-
Enable the BigQuery API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles . -
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
-
Verify that billing is enabled for your Google Cloud project .
-
Enable the BigQuery API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles . -
Make sure that you have the following role or roles on the project: BigQuery Data Editor , Project IAM Admin
Check for the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
-
In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
- For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
-
In the Google Cloud console, go to the IAM page.
Go to IAM - Select the project.
- Click Grant access .
-
In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
- Click Select a role , then search for the role.
- To grant additional roles, click Add another role and add each additional role.
- Click Save .
-
gcloud
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity .
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
Create or select a Google Cloud project .
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
-
Create a Google Cloud project:
gcloud projects create PROJECT_IDReplace
PROJECT_IDwith a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_IDReplace
PROJECT_IDwith your Google Cloud project name.
-
Verify that billing is enabled for your Google Cloud project .
-
Enable the BigQuery API:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles .gcloud services enable bigquery.googleapis.com
-
Install the Google Cloud CLI.
-
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity .
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
Create or select a Google Cloud project .
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
-
Create a Google Cloud project:
gcloud projects create PROJECT_IDReplace
PROJECT_IDwith a name for the Google Cloud project you are creating. -
Select the Google Cloud project that you created:
gcloud config set project PROJECT_IDReplace
PROJECT_IDwith your Google Cloud project name.
-
Verify that billing is enabled for your Google Cloud project .
-
Enable the BigQuery API:
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles .gcloud services enable bigquery.googleapis.com
-
Grant roles to your user account. Run the following command once for each of the following IAM roles:
roles/bigquery.dataEditor, roles/resourcemanager.projectIamAdmingcloud projects add-iam-policy-binding PROJECT_ID --member = "user: USER_IDENTIFIER " --role = ROLE
Replace the following:
-
PROJECT_ID: Your project ID. -
USER_IDENTIFIER: The identifier for your user account. For example,myemail@example.com. -
ROLE: The IAM role that you grant to your user account.
-
Create tables
To store the tables and graph that you create in the following examples, create a dataset
.
The following query creates a dataset called graph_search
:
CREATE
SCHEMA
IF
NOT
EXISTS
graph_search
;
The following tables contain information about people and accounts, and the relationships between each of these entities:
-
Person: information about people. -
Account: information about bank accounts. -
PersonOwnAccount: information about who owns which accounts. -
AccountTransferAccount: information about transfers between accounts.
To create these tables, run the following CREATE TABLE
statements
:
CREATE
OR
REPLACE
TABLE
graph_search
.
Person
(
id
INT64
,
name
STRING
,
PRIMARY
KEY
(
id
)
NOT
ENFORCED
);
CREATE
OR
REPLACE
TABLE
graph_search
.
Account
(
id
INT64
,
create_time
TIMESTAMP
,
is_blocked
BOOL
,
description
STRING
,
description_embedding
STRUCT<result
ARRAY<FLOAT64>
,
status
STRING
>
GENERATED
ALWAYS
AS
(
AI
.
EMBED
(
description
,
model
=
>
'embeddinggemma-300m'
)
)
STORED
OPTIONS
(
asynchronous
=
TRUE
),
PRIMARY
KEY
(
id
)
NOT
ENFORCED
);
CREATE
OR
REPLACE
TABLE
graph_search
.
PersonOwnAccount
(
id
INT64
NOT
NULL
,
account_id
INT64
NOT
NULL
,
create_time
TIMESTAMP
,
PRIMARY
KEY
(
id
,
account_id
)
NOT
ENFORCED
,
FOREIGN
KEY
(
id
)
REFERENCES
graph_search
.
Person
(
id
)
NOT
ENFORCED
,
FOREIGN
KEY
(
account_id
)
REFERENCES
graph_search
.
Account
(
id
)
NOT
ENFORCED
);
CREATE
OR
REPLACE
TABLE
graph_search
.
AccountTransferAccount
(
id
INT64
NOT
NULL
,
to_id
INT64
NOT
NULL
,
amount
FLOAT64
,
create_time
TIMESTAMP
NOT
NULL
,
order_number
STRING
,
notes
STRING
,
notes_embedding
STRUCT<result
ARRAY<FLOAT64>
,
status
STRING
>
GENERATED
ALWAYS
AS
(
AI
.
EMBED
(
notes
,
model
=
>
'embeddinggemma-300m'
)
)
STORED
OPTIONS
(
asynchronous
=
TRUE
),
PRIMARY
KEY
(
id
,
to_id
,
create_time
)
NOT
ENFORCED
,
FOREIGN
KEY
(
id
)
REFERENCES
graph_search
.
Account
(
id
)
NOT
ENFORCED
,
FOREIGN
KEY
(
to_id
)
REFERENCES
graph_search
.
Account
(
id
)
NOT
ENFORCED
);
The Account
and AccountTransferAccount
tables use autonomous embedding
generation to maintain embeddings for their description
and notes
columns.
In this tutorial we use the embeddinggemma-300m
model because it runs in
BigQuery and works well for short strings. For longer strings
that exceed 128 tokens, you should choose a different embedding model, such as text-embedding-005
. For more information, read about choosing an embedding model
.
Insert data
The following queries insert some sample data into your tables. The INSERT
statements
omit the embedding columns and BigQuery populates them
automatically.
INSERT
INTO
graph_search
.
Account
(
id
,
create_time
,
is_blocked
,
description
)
VALUES
(
7
,
"2020-01-10 06:22:20.222"
,
false
,
"Fund for a refreshing tropical vacation"
),
(
16
,
"2020-01-27 17:55:09.206"
,
true
,
"Fund for a rainy day!"
),
(
20
,
"2020-02-18 05:44:20.655"
,
false
,
"Saving up for travel"
);
INSERT
INTO
graph_search
.
Person
(
id
,
name
)
VALUES
(
1
,
"Alex"
),
(
2
,
"Dana"
),
(
3
,
"Lee"
);
INSERT
INTO
graph_search
.
AccountTransferAccount
(
id
,
to_id
,
amount
,
create_time
,
order_number
,
notes
)
VALUES
(
7
,
16
,
300
,
"2020-08-29 15:28:58.647"
,
"304330008004315"
,
"wedding present"
),
(
7
,
16
,
100
,
"2020-10-04 16:55:05.342"
,
"304120005529714"
,
"birthday gift"
),
(
16
,
20
,
300
,
"2020-09-25 02:36:14.926"
,
"103650009791820"
,
"for shared cost of dinner"
),
(
20
,
7
,
500
,
"2020-10-04 16:55:05.342"
,
"304120005529714"
,
"fees for tuition"
),
(
20
,
16
,
200
,
"2020-10-17 03:59:40.247"
,
"302290001484851"
,
"loved the lunch"
);
INSERT
INTO
graph_search
.
PersonOwnAccount
(
id
,
account_id
,
create_time
)
VALUES
(
1
,
7
,
"2020-01-10 06:22:20.222"
),
(
2
,
20
,
"2020-01-27 17:55:09.206"
),
(
3
,
16
,
"2020-02-18 05:44:20.655"
);
Create a graph
The following query uses the CREATE PROPERTY GRAPH
statement
to create a graph called FinGraph
in the graph_search
dataset.
The Account
and Person
tables are the node tables. The AccountTransferAccount
and PersonOwnAccount
tables are the edge tables, which represent relationships between the node
tables.
CREATE
OR
REPLACE
PROPERTY
GRAPH
graph_search
.
FinGraph
NODE
TABLES
(
graph_search
.
Account
,
graph_search
.
Person
)
EDGE
TABLES
(
graph_search
.
PersonOwnAccount
SOURCE
KEY
(
id
)
REFERENCES
Person
(
id
)
DESTINATION
KEY
(
account_id
)
REFERENCES
Account
(
id
)
LABEL
Owns
,
graph_search
.
AccountTransferAccount
SOURCE
KEY
(
id
)
REFERENCES
Account
(
id
)
DESTINATION
KEY
(
to_id
)
REFERENCES
Account
(
id
)
LABEL
Transfers
);
Search nodes
The following queries show who owns accounts for leisure travel and
vacation. The first query uses a DECLARE
statement
to create a variable called similar_account
. The variable is initialized in the DEFAULT
clause with
a call to AI.SEARCH
that find accounts whose
descriptions are most semantically similar to accounts for leisure travel and vacation
. The query sets the top_k
argument to 2
in the call to AI.SEARCH
to limit the number of
results. The second query is a graph query that returns the account owner's
name along with the account description.
DECLARE
similar_account
DEFAULT
((
SELECT
ARRAY_AGG
(
base
.
id
)
FROM
AI
.
SEARCH
(
(
SELECT
*
FROM
graph_search
.
Account
WHERE
description_embedding
IS
NOT
NULL
),
'description'
,
'accounts for leisure travel and vacation'
,
top_k
=
>
2
)
));
GRAPH
graph_search
.
FinGraph
MATCH
(
p
:
Person
)
-[:
Owns
]-
> (
a
:
Account
)
WHERE
a
.
id
IN
UNNEST
(
similar_account
)
RETURN
p
.
name
,
a
.
description
;
The result is similar to the following:
+------+-----------------------------------------+
| name | description |
+------+-----------------------------------------+
| Dana | Saving up for travel |
| Alex | Fund for a refreshing tropical vacation |
+------+-----------------------------------------+
Search edges
The following queries show who made account transfers related to food payments.
The first query uses the AI.SEARCH
function to populate a
variable called food_transfers
. This variable holds the order number of
transfers whose associated note is most semantically similar to food
. The query sets the top_k
argument to 2
in the call to AI.SEARCH
to limit the number of
results. The second query is a graph query that returns the account owner's
name along with the transfer note.
DECLARE
food_transfers
DEFAULT
((
SELECT
ARRAY_AGG
(
base
.
order_number
)
FROM
AI
.
SEARCH
(
(
SELECT
*
FROM
graph_search
.
AccountTransferAccount
WHERE
notes_embedding
IS
NOT
NULL
),
'notes'
,
'food'
,
top_k
=
>
2
)
));
GRAPH
graph_search
.
FinGraph
MATCH
(
p
:
Person
)
-[:
Owns
]-
> (
:
Account
)
-[
t
:
Transfers
]-
> (
:
Account
)
WHERE
t
.
order_number
IN
UNNEST
(
food_transfers
)
RETURN
p
.
name
,
t
.
notes
;
The result is similar to the following:
+------+---------------------------+
| name | notes |
+------+---------------------------+
| Dana | loved the lunch |
| Lee | for shared cost of dinner |
+------+---------------------------+
Create a vector index
Vector indexes reduce the latency and computational cost of your searches. The tables in this tutorial are too small to use a vector index. Vector indexes are useful when your tables are large, typically with millions of rows. BigQuery offers two types of index: IVF and TreeAH. For more information about creating an index and choosing a type, see Manage vector indexes .
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete the project
Delete a Google Cloud project:
gcloud projects delete PROJECT_ID
What's next
- Learn more about BigQuery Graph .
- Learn how to create and query a graph .
- Learn more about creating a vector index and performing semantic search and RAG .

