Generate text embeddings by using an open model and the ML.GENERATE_EMBEDDING function
This tutorial shows you how to create a remote model
that's based on the
open-source text embedding model Qwen3-Embedding-0.6B
,
and then how to use that model with the ML.GENERATE_EMBEDDING
function
to embed movie reviews from the bigquery-public-data.imdb.reviews
public table.
Required permissions
To run this tutorial, you need the following Identity and Access Management (IAM) roles:
- Create and use BigQuery datasets, connections, and models:
BigQuery Admin (
roles/bigquery.admin). - Grant permissions to the connection's service account: Project IAM Admin
(
roles/resourcemanager.projectIamAdmin). - Deploy and undeploy models in Vertex AI: Vertex AI Administrator
(
roles/aiplatform.admin).
These predefined roles contain the permissions required to perform the tasks in this document. To see the exact permissions that are required, expand the Required permissionssection:
Required permissions
- Create a dataset:
bigquery.datasets.create - Create, delegate, and use a connection:
bigquery.connections.* - Set the default connection:
bigquery.config.* - Set service account permissions:
resourcemanager.projects.getIamPolicyandresourcemanager.projects.setIamPolicy - Deploy and undeploy a Vertex AI model:
-
aiplatform.endpoints.deploy -
aiplatform.endpoints.undeploy
-
- Create a model and run inference:
-
bigquery.jobs.create -
bigquery.models.create -
bigquery.models.getData -
bigquery.models.updateData -
bigquery.models.updateMetadata
-
You might also be able to get these permissions with custom roles or other predefined roles .
Costs
In this document, you use the following billable components of Google Cloud:
- BigQuery ML : You incur costs for the data that you process in BigQuery.
- Vertex AI : You incur costs for calls to the Vertex AI model that's represented by the remote model.
To generate a cost estimate based on your projected usage, use the pricing calculator .
For more information about BigQuery pricing, see BigQuery pricing in the BigQuery documentation.
Open models that you deploy to Vertex AI are charged per machine-hour. This means billing starts as soon as the endpoint is fully set up, and continues until you un-deploy it. For more information about Vertex AI pricing, see the Vertex AI pricing page.
Before you begin
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
-
Verify that billing is enabled for your Google Cloud project .
-
Enable the BigQuery, BigQuery Connection, and Vertex AI APIs.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission. Learn how to grant roles .
Create a dataset
Create a BigQuery dataset to store your ML model.
Console
-
In the Google Cloud console, go to the BigQuerypage.
-
In the Explorerpane, click your project name.
-
Click View actions > Create dataset
-
On the Create datasetpage, do the following:
-
For Dataset ID, enter
bqml_tutorial. -
For Location type, select Multi-region, and then select US (multiple regions in United States).
-
Leave the remaining default settings as they are, and click Create dataset.
-
bq
To create a new dataset, use the bq mk
command
with the --location
flag. For a full list of possible parameters, see the bq mk --dataset
command
reference.
-
Create a dataset named
bqml_tutorialwith the data location set toUSand a description ofBigQuery ML tutorial dataset:bq --location=US mk -d \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial
Instead of using the
--datasetflag, the command uses the-dshortcut. If you omit-dand--dataset, the command defaults to creating a dataset. -
Confirm that the dataset was created:
bq ls
API
Call the datasets.insert
method with a defined dataset resource
.
{ "datasetReference" : { "datasetId" : "bqml_tutorial" } }
BigQuery DataFrames
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames . For more information, see the BigQuery DataFrames reference documentation .
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment .
Create the remote model
Create a remote model that represents a hosted Vertex AI model:
-
In the Google Cloud console, go to the BigQuerypage.
-
In the query editor, run the following statement:
CREATE OR REPLACE MODEL ` bqml_tutorial . qwen3_embedding_model ` REMOTE WITH CONNECTION DEFAULT OPTIONS ( HUGGING_FACE_MODEL_ID = 'Qwen/Qwen3-Embedding-0.6B' );
The query takes up to 20 minutes to complete, after which the qwen3_embedding_model
model appears in the bqml_tutorial
dataset in the Explorerpane. Because the query uses a CREATE MODEL
statement to create a
model, there are no query results.
Perform text embedding
Perform text embedding on IMDB
movie reviews by
using the remote model and the ML.GENERATE_EMBEDDING
function:
-
In the Google Cloud console, go to the BigQuerypage.
-
In the query editor, enter the following statement to perform text embedding on five movie reviews:
SELECT * FROM ML . GENERATE_EMBEDDING ( MODEL ` bqml_tutorial . qwen3_embedding_model ` , ( SELECT review AS content , * FROM ` bigquery - public - data . imdb . reviews ` LIMIT 5 ) );
The results include the following columns:
-
ml_generate_embedding_result: an array of double to represent the generated embeddings. -
ml_generate_embedding_status: the API response status for the corresponding row. If the operation was successful, this value is empty. -
content: the input text from which to extract embeddings. - All of the columns from the
bigquery-public-data.imdb.reviewstable.
-
Undeploy model
If you choose not to delete your project as recommended
, you must
undeploy the Qwen3 embedding model in Vertex AI to avoid
continued billing for it. BigQuery automatically undeploys the
model after a specified period of idleness (6.5 hours by default).
Alternatively, you can immediately undeploy the model by using the ALTER MODEL
statement
,
as shown in the following example:
ALTER MODEL ` bqml_tutorial . qwen3_embedding_model ` SET OPTIONS ( deploy_model = false );
For more information, see Automatic or immediate open model undeployment .
Clean up
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete .
- In the dialog, type the project ID, and then click Shut down to delete the project.

