Analyze multimodal data with SQL and BigQuery DataFrames
This tutorial shows you how to analyze multimodal data by using SQL queries and BigQuery DataFrames .
This tutorial uses the product catalog from the public Cymbal pet store dataset.
Objectives
- Use
ObjectRefvalues to store image data alongside structured data in a BigQuery standard table . - Enrich your data with image descriptions, keywords, and animal types, and
subcategories by using the
AI.GENERATEfunction . - Generate embeddings based on image data by using the
AI.EMBEDfunction . - Find similar images by using the
VECTOR_SEARCHfunction. - Summarize user manuals by processing ordered multimodal data using arrays
of
ObjectRefvalues.
Costs
In this document, you use the following billable components of Google Cloud:
- BigQuery : you incur costs for the data that you process in BigQuery.
- Cloud Storage : you incur costs for the objects stored in Cloud Storage.
- Gemini Enterprise Agent Platform : you incur costs for calls to Agent Platform models.
To generate a cost estimate based on your projected usage, use the pricing calculator .
For more information about, see the following pricing pages:
Before you begin
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
-
Verify that billing is enabled for your Google Cloud project .
-
Enable the BigQuery, BigQuery Connection, Cloud Storage, and Agent Platform API APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles .
Required roles
To get the permissions that you need to complete this tutorial, ask your administrator to grant you the following IAM roles:
- Create a connection: BigQuery Connection Admin
(
roles/bigquery.connectionAdmin) - Grant permissions to the connection's service account: Project IAM Admin
(
roles/resourcemanager.projectIamAdmin) - Create a Cloud Storage bucket: Storage Admin
(
roles/storage.admin) - Create datasets, models, UDFs, and tables, and run BigQuery jobs: BigQuery Admin
(
roles/bigquery.admin) - Create URLs that let you read and modify Cloud Storage objects: BigQuery ObjectRef Admin
(
roles/bigquery.objectRefAdmin)
For more information about granting roles, see Manage access to projects, folders, and organizations .
You might also be able to get the required permissions through custom roles or other predefined roles .
Set up
In this section, you create the dataset, connection, tables, and models used in this tutorial.
Create a dataset
Create a BigQuery dataset to contain the objects you create in this tutorial:
-
In the Google Cloud console, go to the BigQuerypage.
-
In the left pane, click Explorer:

If you don't see the left pane, click Expand left paneto open the pane.
-
In the Explorerpane, select your project.
-
Click View actions, and then click Create dataset. The Create datasetpane opens.
-
For Dataset ID, type
cymbal_pets. -
Click Create dataset.
Create a connection
Create a Cloud resource connection and get the connection's service account. BigQuery uses the connection to access objects in Cloud Storage:
-
Go to the BigQuerypage.
-
In the left pane, click Explorer:

-
In the Explorerpane, click Add data.
The Add datadialog opens.
-
In the Filter Bypane, in the Data Source Typesection, select Business Applications.
Alternatively, in the Search for data sourcesfield, you can enter
Vertex AI. -
In the Featured data sourcessection, click Vertex AI.
-
Click the Vertex AI Models: BigQuery Federationsolution card.
-
In the Connection typelist, select Vertex AI remote models, remote functions, BigLake and Spanner (Cloud Resource).
-
In the Connection IDfield, type
cymbal_conn. -
Click Create connection.
-
Click Go to connection.
-
In the Connection infopane, copy the service account ID for use in a following step.
Grant permissions to the connection's service account
Grant the connection's service account the appropriate roles to access other
services. You must grant these roles in the same project you created or
selected in the Before you begin
section. Granting the
roles in a different project results in the error bqcx-1234567890-xxxx@gcp-sa-bigquery-condel.iam.gserviceaccount.com
does not have the permission to access resource
.
Create a bucket
Create a Cloud Storage bucket for storing transformed objects:
-
Go to the Bucketspage.
-
Click Create.
-
On the Create a bucketpage, in the Get startedsection, enter a globally unique name that meets the bucket name requirements .
-
Click Create.
Grant permissions on the Cloud Storage bucket
Give the service account access to use objects in the bucket you created:
-
Go to the Bucketspage.
-
Click the name of the bucket you created.
-
Click Permissions.
-
Click Grant access. The Grant accessdialog opens.
-
In the New principalsfield, enter the service account ID that you copied earlier.
-
In the Select a rolefield, choose Cloud Storage, and then select Storage Object User.
-
Click Save.
Grant permissions on to use Agent Platform models
Give the service account access to use Agent Platform models:
-
Go to the IAM & Adminpage.
-
Click Grant access. The Grant accessdialog opens.
-
In the New principalsfield, enter the service account ID that you copied earlier.
-
In the Select a rolefield, enter Agent Platform User.
-
Click Save.
Create the tables of example data
Create tables to store the Cymbal pets product information.
Create the products
table
Create a standard table that contains the Cymbal pets product information:
-
In the Google Cloud console, go to the BigQuerypage.
-
Run the following to create the
productstable:SQL
LOAD DATA OVERWRITE cymbal_pets . products FROM FILES ( format = 'avro' , uris = [ 'gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/tables/products/products_*.avro' ] );
BigQuery DataFrames
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames . For more information, see the BigQuery DataFrames reference documentation .
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment .
Create the product_images
table
Create an object table that contains the Cymbal pets product images:
-
Run the following to create the
product_imagestable:SQL
CREATE OR REPLACE EXTERNAL TABLE cymbal_pets . product_images WITH CONNECTION `us.cymbal_conn` OPTIONS ( object_metadata = 'SIMPLE' , uris = [ 'gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/images/*.png' ] , max_staleness = INTERVAL 30 MINUTE , metadata_cache_mode = AUTOMATIC );
BigQuery DataFrames
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames . For more information, see the BigQuery DataFrames reference documentation .
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment .
Create models
The SQL instructions in this tutorial show how to call AI functions that don't require you to create a model. If you're following the BigQuery DataFrames instructions, select that option to create remote models that represent a Gemini model and a multimodal embedding model.
SQL
You can skip this step.
BigQuery DataFrames
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames . For more information, see the BigQuery DataFrames reference documentation .
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment .

