Search a graph

This tutorial shows you how to perform semantic search on your graph data by using autonomous embedding generation and the AI.SEARCH function .

Objectives

This tutorial covers the following tasks:

Create tables that hold information about people, financial accounts, account ownership, and account transfers.
Use autonomous embedding generation to simplify your embedding maintenance workflow.
Create a graph that defines the relationships between data stored in your tables.
Use the AI.SEARCH function on your graph nodes to perform semantic search on account descriptions.
Use the AI.SEARCH function on your graph edges to perform semantic search on account transfer notes.

Costs

In this document, you use the following billable components of Google Cloud:

BigQuery : You incur costs for the data that you process in BigQuery.

To generate a cost estimate based on your projected usage, use the pricing calculator .

New Google Cloud users might be eligible for a free trial .

When you finish the tasks that are described in this document, you can avoid continued billing by deleting the resources that you created. For more information, see Clean up .

Before you begin

Console

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project : To create a project, you need the Project Creator role ( roles/resourcemanager.projectCreator ), which contains the resourcemanager.projects.create permission. Learn how to grant roles .

Go to project selector

Verify that billing is enabled for your Google Cloud project .

Enable the BigQuery API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role ( roles/serviceusage.serviceUsageAdmin ), which contains the serviceusage.services.enable permission. Learn how to grant roles .

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project : To create a project, you need the Project Creator role ( roles/resourcemanager.projectCreator ), which contains the resourcemanager.projects.create permission. Learn how to grant roles .

Go to project selector

Verify that billing is enabled for your Google Cloud project .

Enable the BigQuery API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role ( roles/serviceusage.serviceUsageAdmin ), which contains the serviceusage.services.enable permission. Learn how to grant roles .

Enable the API

Make sure that you have the following role or roles on the project: BigQuery Data Editor , Project IAM Admin
Check for the roles
1. In the Google Cloud console, go to the IAM page.
  Go to IAM
2. Select the project.
3. In the Principal column, find all rows that identify you or a group that you're included in. To learn which groups you're included in, contact your administrator.
4. For all rows that specify or include you, check the Role column to see whether the list of roles includes the required roles.
Grant the roles
1. In the Google Cloud console, go to the IAM page.
  Go to IAM
2. Select the project.
3. Click Grant access .
4. In the New principals field, enter your user identifier. This is typically the email address for a Google Account.
5. Click Select a role , then search for the role.
6. To grant additional roles, click Add another role and add each additional role.
7. Click Save .

gcloud

Install the Google Cloud CLI.

If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity .

To initialize the gcloud CLI, run the following command:

gcloud  
init

Create or select a Google Cloud project .

Roles required to select or create a project

Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project : To create a project, you need the Project Creator role ( roles/resourcemanager.projectCreator ), which contains the resourcemanager.projects.create permission. Learn how to grant roles .

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID 
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID 
```
Replace PROJECT_ID with your Google Cloud project name.

Verify that billing is enabled for your Google Cloud project .

Enable the BigQuery API:

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role ( roles/serviceusage.serviceUsageAdmin ), which contains the serviceusage.services.enable permission. Learn how to grant roles .

gcloud  
services  
 enable 
  
bigquery.googleapis.com

Install the Google Cloud CLI.

If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity .

To initialize the gcloud CLI, run the following command:

gcloud  
init

Create or select a Google Cloud project .

Roles required to select or create a project

Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project : To create a project, you need the Project Creator role ( roles/resourcemanager.projectCreator ), which contains the resourcemanager.projects.create permission. Learn how to grant roles .

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID 
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID 
```
Replace PROJECT_ID with your Google Cloud project name.

Verify that billing is enabled for your Google Cloud project .

Enable the BigQuery API:

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role ( roles/serviceusage.serviceUsageAdmin ), which contains the serviceusage.services.enable permission. Learn how to grant roles .

gcloud  
services  
 enable 
  
bigquery.googleapis.com

Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/bigquery.dataEditor, roles/resourcemanager.projectIamAdmin
```
gcloud  
projects  
add-iam-policy-binding  
 PROJECT_ID 
  
--member = 
 "user: USER_IDENTIFIER 
" 
  
--role = 
 ROLE 
```
Replace the following:
- PROJECT_ID : Your project ID.
- USER_IDENTIFIER : The identifier for your user account. For example, myemail@example.com .
- ROLE : The IAM role that you grant to your user account.

Create tables

To store the tables and graph that you create in the following examples, create a dataset . The following query creates a dataset called graph_search :

  CREATE 
  
 SCHEMA 
  
 IF 
  
 NOT 
  
 EXISTS 
  
 graph_search 
 ;

The following tables contain information about people and accounts, and the relationships between each of these entities:

Person : information about people.
Account : information about bank accounts.
PersonOwnAccount : information about who owns which accounts.
AccountTransferAccount : information about transfers between accounts.

To create these tables, run the following CREATE TABLE statements :

  CREATE 
  
 OR 
  
 REPLACE 
  
 TABLE 
  
 graph_search 
 . 
 Person 
  
 ( 
  
 id 
  
 INT64 
 , 
  
 name 
  
 STRING 
 , 
  
 PRIMARY 
  
 KEY 
  
 ( 
 id 
 ) 
  
 NOT 
  
 ENFORCED 
 ); 
 CREATE 
  
 OR 
  
 REPLACE 
  
 TABLE 
  
 graph_search 
 . 
 Account 
  
 ( 
  
 id 
  
 INT64 
 , 
  
 create_time 
  
 TIMESTAMP 
 , 
  
 is_blocked 
  
 BOOL 
 , 
  
 description 
  
 STRING 
 , 
  
 description_embedding 
  
 STRUCT<result 
  
 ARRAY<FLOAT64> 
 , 
  
 status 
  
 STRING 
>  
 GENERATED 
  
 ALWAYS 
  
 AS 
  
 ( 
  
 AI 
 . 
 EMBED 
 ( 
 description 
 , 
  
 model 
  
 = 
>  
 'embeddinggemma-300m' 
 ) 
  
 ) 
  
 STORED 
  
 OPTIONS 
 ( 
  
 asynchronous 
  
 = 
  
 TRUE 
  
 ), 
  
 PRIMARY 
  
 KEY 
  
 ( 
 id 
 ) 
  
 NOT 
  
 ENFORCED 
 ); 
 CREATE 
  
 OR 
  
 REPLACE 
  
 TABLE 
  
 graph_search 
 . 
 PersonOwnAccount 
  
 ( 
  
 id 
  
 INT64 
  
 NOT 
  
 NULL 
 , 
  
 account_id 
  
 INT64 
  
 NOT 
  
 NULL 
 , 
  
 create_time 
  
 TIMESTAMP 
 , 
  
 PRIMARY 
  
 KEY 
  
 ( 
 id 
 , 
  
 account_id 
 ) 
  
 NOT 
  
 ENFORCED 
 , 
  
 FOREIGN 
  
 KEY 
  
 ( 
 id 
 ) 
  
 REFERENCES 
  
 graph_search 
 . 
 Person 
 ( 
 id 
 ) 
  
 NOT 
  
 ENFORCED 
 , 
  
 FOREIGN 
  
 KEY 
  
 ( 
 account_id 
 ) 
  
 REFERENCES 
  
 graph_search 
 . 
 Account 
 ( 
 id 
 ) 
  
 NOT 
  
 ENFORCED 
 ); 
 CREATE 
  
 OR 
  
 REPLACE 
  
 TABLE 
  
 graph_search 
 . 
 AccountTransferAccount 
  
 ( 
  
 id 
  
 INT64 
  
 NOT 
  
 NULL 
 , 
  
 to_id 
  
 INT64 
  
 NOT 
  
 NULL 
 , 
  
 amount 
  
 FLOAT64 
 , 
  
 create_time 
  
 TIMESTAMP 
  
 NOT 
  
 NULL 
 , 
  
 order_number 
  
 STRING 
 , 
  
 notes 
  
 STRING 
 , 
  
 notes_embedding 
  
 STRUCT<result 
  
 ARRAY<FLOAT64> 
 , 
  
 status 
  
 STRING 
>  
 GENERATED 
  
 ALWAYS 
  
 AS 
  
 ( 
  
 AI 
 . 
 EMBED 
 ( 
 notes 
 , 
  
 model 
  
 = 
>  
 'embeddinggemma-300m' 
 ) 
  
 ) 
  
 STORED 
  
 OPTIONS 
 ( 
  
 asynchronous 
  
 = 
  
 TRUE 
  
 ), 
  
 PRIMARY 
  
 KEY 
  
 ( 
 id 
 , 
  
 to_id 
 , 
  
 create_time 
 ) 
  
 NOT 
  
 ENFORCED 
 , 
  
 FOREIGN 
  
 KEY 
  
 ( 
 id 
 ) 
  
 REFERENCES 
  
 graph_search 
 . 
 Account 
 ( 
 id 
 ) 
  
 NOT 
  
 ENFORCED 
 , 
  
 FOREIGN 
  
 KEY 
  
 ( 
 to_id 
 ) 
  
 REFERENCES 
  
 graph_search 
 . 
 Account 
 ( 
 id 
 ) 
  
 NOT 
  
 ENFORCED 
 );

The Account and AccountTransferAccount tables use autonomous embedding generation to maintain embeddings for their description and notes columns.

In this tutorial we use the embeddinggemma-300m model because it runs in BigQuery and works well for short strings. For longer strings that exceed 128 tokens, you should choose a different embedding model, such as text-embedding-005 . For more information, read about choosing an embedding model .

Insert data

The following queries insert some sample data into your tables. The INSERT statements omit the embedding columns and BigQuery populates them automatically.

  INSERT 
  
 INTO 
  
 graph_search 
 . 
 Account 
  
 ( 
 id 
 , 
  
 create_time 
 , 
  
 is_blocked 
 , 
  
 description 
 ) 
 VALUES 
  
 ( 
 7 
 , 
 "2020-01-10 06:22:20.222" 
 , 
 false 
 , 
 "Fund for a refreshing tropical vacation" 
 ), 
  
 ( 
 16 
 , 
 "2020-01-27 17:55:09.206" 
 , 
 true 
 , 
 "Fund for a rainy day!" 
 ), 
  
 ( 
 20 
 , 
 "2020-02-18 05:44:20.655" 
 , 
 false 
 , 
 "Saving up for travel" 
 ); 
 INSERT 
  
 INTO 
  
 graph_search 
 . 
 Person 
  
 ( 
 id 
 , 
  
 name 
 ) 
 VALUES 
  
 ( 
 1 
 , 
 "Alex" 
 ), 
  
 ( 
 2 
 , 
 "Dana" 
 ), 
  
 ( 
 3 
 , 
 "Lee" 
 ); 
 INSERT 
  
 INTO 
  
 graph_search 
 . 
 AccountTransferAccount 
  
 ( 
 id 
 , 
  
 to_id 
 , 
  
 amount 
 , 
  
 create_time 
 , 
  
 order_number 
 , 
  
 notes 
 ) 
 VALUES 
  
 ( 
 7 
 , 
 16 
 , 
 300 
 , 
 "2020-08-29 15:28:58.647" 
 , 
 "304330008004315" 
 , 
  
 "wedding present" 
 ), 
  
 ( 
 7 
 , 
 16 
 , 
 100 
 , 
 "2020-10-04 16:55:05.342" 
 , 
 "304120005529714" 
 , 
  
 "birthday gift" 
 ), 
  
 ( 
 16 
 , 
 20 
 , 
 300 
 , 
 "2020-09-25 02:36:14.926" 
 , 
 "103650009791820" 
 , 
  
 "for shared cost of dinner" 
 ), 
  
 ( 
 20 
 , 
 7 
 , 
 500 
 , 
 "2020-10-04 16:55:05.342" 
 , 
 "304120005529714" 
 , 
  
 "fees for tuition" 
 ), 
  
 ( 
 20 
 , 
 16 
 , 
 200 
 , 
 "2020-10-17 03:59:40.247" 
 , 
 "302290001484851" 
 , 
  
 "loved the lunch" 
 ); 
 INSERT 
  
 INTO 
  
 graph_search 
 . 
 PersonOwnAccount 
  
 ( 
 id 
 , 
  
 account_id 
 , 
  
 create_time 
 ) 
 VALUES 
  
 ( 
 1 
 , 
 7 
 , 
 "2020-01-10 06:22:20.222" 
 ), 
  
 ( 
 2 
 , 
 20 
 , 
 "2020-01-27 17:55:09.206" 
 ), 
  
 ( 
 3 
 , 
 16 
 , 
 "2020-02-18 05:44:20.655" 
 );

Create a graph

The following query uses the CREATE PROPERTY GRAPH statement to create a graph called FinGraph in the graph_search dataset. The Account and Person tables are the node tables. The AccountTransferAccount and PersonOwnAccount tables are the edge tables, which represent relationships between the node tables.

  CREATE 
  
 OR 
  
 REPLACE 
  
 PROPERTY 
  
 GRAPH 
  
 graph_search 
 . 
 FinGraph 
 NODE 
  
 TABLES 
  
 ( 
 graph_search 
 . 
 Account 
 , 
  
 graph_search 
 . 
 Person 
 ) 
 EDGE 
  
 TABLES 
  
 ( 
  
 graph_search 
 . 
 PersonOwnAccount 
  
 SOURCE 
  
 KEY 
  
 ( 
 id 
 ) 
  
 REFERENCES 
  
 Person 
  
 ( 
 id 
 ) 
  
 DESTINATION 
  
 KEY 
  
 ( 
 account_id 
 ) 
  
 REFERENCES 
  
 Account 
  
 ( 
 id 
 ) 
  
 LABEL 
  
 Owns 
 , 
  
 graph_search 
 . 
 AccountTransferAccount 
  
 SOURCE 
  
 KEY 
  
 ( 
 id 
 ) 
  
 REFERENCES 
  
 Account 
  
 ( 
 id 
 ) 
  
 DESTINATION 
  
 KEY 
  
 ( 
 to_id 
 ) 
  
 REFERENCES 
  
 Account 
  
 ( 
 id 
 ) 
  
 LABEL 
  
 Transfers 
 );

Search nodes

The following queries show who owns accounts for leisure travel and vacation. The first query uses a DECLARE statement to create a variable called similar_account . The variable is initialized in the DEFAULT clause with a call to AI.SEARCH that find accounts whose descriptions are most semantically similar to accounts for leisure travel and vacation . The query sets the top_k argument to 2 in the call to AI.SEARCH to limit the number of results. The second query is a graph query that returns the account owner's name along with the account description.

  DECLARE 
  
 similar_account 
  
 DEFAULT 
  
 (( 
 SELECT 
  
 ARRAY_AGG 
 ( 
 base 
 . 
 id 
 ) 
 FROM 
  
 AI 
 . 
 SEARCH 
 ( 
  
 ( 
 SELECT 
  
 * 
  
 FROM 
  
 graph_search 
 . 
 Account 
  
 WHERE 
  
 description_embedding 
  
 IS 
  
 NOT 
  
 NULL 
 ), 
  
 'description' 
 , 
  
 'accounts for leisure travel and vacation' 
 , 
  
 top_k 
  
 = 
>  
 2 
 ) 
 )); 
 GRAPH 
  
 graph_search 
 . 
 FinGraph 
 MATCH 
  
 ( 
 p 
 : 
 Person 
 ) 
 -[: 
 Owns 
 ]- 
> ( 
 a 
 : 
 Account 
 ) 
 WHERE 
  
 a 
 . 
 id 
  
 IN 
  
 UNNEST 
 ( 
 similar_account 
 ) 
 RETURN 
  
 p 
 . 
 name 
 , 
  
 a 
 . 
 description 
 ;

The result is similar to the following:

 +------+-----------------------------------------+
| name | description                             |
+------+-----------------------------------------+
| Dana | Saving up for travel                    |
| Alex | Fund for a refreshing tropical vacation |
+------+-----------------------------------------+

Search edges

The following queries show who made account transfers related to food payments. The first query uses the AI.SEARCH function to populate a variable called food_transfers . This variable holds the order number of transfers whose associated note is most semantically similar to food . The query sets the top_k argument to 2 in the call to AI.SEARCH to limit the number of results. The second query is a graph query that returns the account owner's name along with the transfer note.

  DECLARE 
  
 food_transfers 
  
 DEFAULT 
  
 (( 
 SELECT 
  
 ARRAY_AGG 
 ( 
 base 
 . 
 order_number 
 ) 
 FROM 
  
 AI 
 . 
 SEARCH 
 ( 
  
 ( 
 SELECT 
  
 * 
  
 FROM 
  
 graph_search 
 . 
 AccountTransferAccount 
  
 WHERE 
  
 notes_embedding 
  
 IS 
  
 NOT 
  
 NULL 
 ), 
  
 'notes' 
 , 
  
 'food' 
 , 
  
 top_k 
  
 = 
>  
 2 
 ) 
 )); 
 GRAPH 
  
 graph_search 
 . 
 FinGraph 
 MATCH 
  
 ( 
 p 
 : 
 Person 
 ) 
 -[: 
 Owns 
 ]- 
> ( 
 : 
 Account 
 ) 
 -[ 
 t 
 : 
 Transfers 
 ]- 
> ( 
 : 
 Account 
 ) 
 WHERE 
  
 t 
 . 
 order_number 
  
 IN 
  
 UNNEST 
 ( 
 food_transfers 
 ) 
 RETURN 
  
 p 
 . 
 name 
 , 
  
 t 
 . 
 notes 
 ;

The result is similar to the following:

 +------+---------------------------+
| name | notes                     |
+------+---------------------------+
| Dana | loved the lunch           |
| Lee  | for shared cost of dinner |
+------+---------------------------+

Create a vector index

Vector indexes reduce the latency and computational cost of your searches. The tables in this tutorial are too small to use a vector index. Vector indexes are useful when your tables are large, typically with millions of rows. BigQuery offers two types of index: IVF and TreeAH. For more information about creating an index and choosing a type, see Manage vector indexes .

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

Delete a Google Cloud project:

gcloud projects delete PROJECT_ID

What's next

Learn more about BigQuery Graph .
Learn how to create and query a graph .
Learn more about creating a vector index and performing semantic search and RAG .

Search a graph Stay organized with collections Save and categorize content based on your preferences.

Objectives

Costs

Before you begin

Console

Check for the roles

Grant the roles

gcloud

Create tables

Insert data

Create a graph

Search nodes

Search edges

Create a vector index

Clean up

Delete the project

What's next

Search a graph