Generate text by using a Gemini model and the ML.GENERATE_TEXT function
This tutorial shows you how to create a remote model
that's based on the gemini-2.0-flash
model
,
and then how to use that model with the ML.GENERATE_TEXT
function
to extract keywords from and perform sentiment analysis on movie reviews from
the bigquery-public-data.imdb.reviews
public table.
Required roles
To run this tutorial, you need the following Identity and Access Management (IAM) roles:
- Create and use BigQuery datasets, connections, and models:
BigQuery Admin (
roles/bigquery.admin
). - Grant permissions to the connection's service account: Project IAM Admin
(
roles/resourcemanager.projectIamAdmin
).
These predefined roles contain the permissions required to perform the tasks in this document. To see the exact permissions that are required, expand the Required permissionssection:
Required permissions
- Create a dataset:
bigquery.datasets.create
- Create, delegate, and use a connection:
bigquery.connections.*
- Set the default connection:
bigquery.config.*
- Set service account permissions:
resourcemanager.projects.getIamPolicy
andresourcemanager.projects.setIamPolicy
- Create a model and run inference:
-
bigquery.jobs.create
-
bigquery.models.create
-
bigquery.models.getData
-
bigquery.models.updateData
-
bigquery.models.updateMetadata
-
You might also be able to get these permissions with custom roles or other predefined roles .
Costs
In this document, you use the following billable components of Google Cloud:
- BigQuery ML : You incur costs for the data that you process in BigQuery.
- Vertex AI : You incur costs for calls to the Vertex AI service that's represented by the remote model.
To generate a cost estimate based on your projected usage, use the pricing calculator .
For more information about BigQuery pricing, see BigQuery pricing in the BigQuery documentation.
For more information about Vertex AI pricing, see the Vertex AI pricing page.
Before you begin
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Verify that billing is enabled for your Google Cloud project .
-
Enable the BigQuery, BigQuery Connection, and Vertex AI APIs.
Create a dataset
Create a BigQuery dataset to store your ML model.
Console
-
In the Google Cloud console, go to the BigQuerypage.
-
In the Explorerpane, click your project name.
-
Click View actions > Create dataset
-
On the Create datasetpage, do the following:
-
For Dataset ID, enter
bqml_tutorial
. -
For Location type, select Multi-region, and then select US (multiple regions in United States).
-
Leave the remaining default settings as they are, and click Create dataset.
-
bq
To create a new dataset, use the bq mk
command
with the --location
flag. For a full list of possible parameters, see the bq mk --dataset
command
reference.
-
Create a dataset named
bqml_tutorial
with the data location set toUS
and a description ofBigQuery ML tutorial dataset
:bq --location=US mk -d \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial
Instead of using the
--dataset
flag, the command uses the-d
shortcut. If you omit-d
and--dataset
, the command defaults to creating a dataset. -
Confirm that the dataset was created:
bq ls
API
Call the datasets.insert
method with a defined dataset resource
.
{ "datasetReference" : { "datasetId" : "bqml_tutorial" } }
BigQuery DataFrames
Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames . For more information, see the BigQuery DataFrames reference documentation .
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment .
Create the remote model
Create a remote model that represents a hosted Vertex AI model:
-
In the Google Cloud console, go to the BigQuerypage.
-
In the query editor, run the following statement:
CREATE OR REPLACE MODEL `bqml_tutorial.gemini_model` REMOTE WITH CONNECTION DEFAULT OPTIONS ( ENDPOINT = 'gemini-2.0-flash' );
The query takes several seconds to complete, after which the model gemini_model
appears in the bqml_tutorial
dataset in the Explorerpane.
Because the query uses a CREATE MODEL
statement to create a model, there
are no query results.
Perform keyword extraction
Perform keyword extraction on IMDB
movie reviews by
using the remote model and the ML.GENERATE_TEXT
function:
-
In the Google Cloud console, go to the BigQuerypage.
-
In the query editor, enter the following statement to perform keyword extraction on five movie reviews:
SELECT ml_generate_text_result [ 'candidates' ][ 0 ][ 'content' ] AS generated_text , * EXCEPT ( ml_generate_text_result ) FROM ML . GENERATE_TEXT ( MODEL `bqml_tutorial.gemini_model` , ( SELECT CONCAT ( 'Extract the key words from the text below: ' , review ) AS prompt , * FROM `bigquery-public-data.imdb.reviews` LIMIT 5 ), STRUCT ( 0.2 AS temperature , 100 AS max_output_tokens ));
The output is similar to the following, with non-generated columns omitted for clarity:
+----------------------------------------+-------------------------+----------------------------+-----+ | generated_text | ml_generate_text_status | prompt | ... | +----------------------------------------+-------------------------+----------------------------+-----+ | {"parts":[{"text":"## Key words:\n\n* | | Extract the key words from | | | **Negative sentiment:** \"terribly | | the text below: I had to | | | bad acting\", \"dumb story\", \"not | | see this on the British | | | even a kid would enjoy this\", | | Airways plane. It was | | | \"something to switch off\"\n* | | terribly bad acting and | | | **Context:** \"British Airways plane\" | | a dumb story. Not even | | | \n* **Genre:** \"movie\" (implied)... | | a kid would enjoy this... | | +----------------------------------------+-------------------------+----------------------------+-----+ | {"parts":[{"text":"## Key words:\n\n* | | Extract the key words from | | | **Movie:** The Real Howard Spitz\n* | | the text below: This is | | | **Genre:** Family movie\n* | | a family movie that was | | | **Broadcast:** ITV station, 1.00 am\n* | | broadcast on my local | | | **Director:** Vadim Jean\n* | | ITV station at 1.00 am a | | | **Main character:** Howard Spitz, | | couple of nights ago. | | | a children's author who hates... | | This might be a strange... | | +----------------------------------------+-------------------------+----------------------------+-----+
The results include the following columns:
-
generated_text
: the generated text. -
ml_generate_text_status
: the API response status for the corresponding row. If the operation was successful, this value is empty. -
prompt
: the prompt that is used for the sentiment analysis. - All of the columns from the
bigquery-public-data.imdb.reviews
table.
-
-
Optional: Instead of manually parsing the JSON returned by the function, as you did in the previous step, use the
flatten_json_output
argument to return the generated text and the safety attributes in separate columns.In the query editor, run the following statement:
SELECT * FROM ML . GENERATE_TEXT ( MODEL `bqml_tutorial.gemini_model` , ( SELECT CONCAT ( 'Extract the key words from the text below: ' , review ) AS prompt , * FROM `bigquery-public-data.imdb.reviews` LIMIT 5 ), STRUCT ( 0.2 AS temperature , 100 AS max_output_tokens , TRUE AS flatten_json_output ));
The output is similar to the following, with non-generated columns omitted for clarity:
+----------------------------------------+----------------------------------------------+-------------------------+----------------------------+-----+ | ml_generate_text_llm_result | ml_generate_text_rai_result | ml_generate_text_status | prompt | ... | +----------------------------------------+----------------------------------------------+-------------------------+----------------------------+-----+ | ## Keywords: | | | Extract the key words from | | | | | | the text below: I had to | | | * **Negative sentiment:** | | | see this on the British | | | "terribly bad acting", "dumb | | | Airways plane. It was | | | story", "not even a kid would | | | terribly bad acting and | | | enjoy this", "switch off" | | | a dumb story. Not even | | | * **Context:** "British | | | a kid would enjoy this... | | +----------------------------------------+----------------------------------------------+-------------------------+----------------------------+-----+ | ## Key words: | | | Extract the key words from | | | | | | the text below: This is | | | * **Movie:** The Real Howard Spitz | | | a family movie that was | | | * **Genre:** Family movie | | | broadcast on my local | | | * **Broadcast:** ITV, 1.00 | | | ITV station at 1.00 am a | | | am | | | couple of nights ago. | | | - ... | | | This might be a strange... | | +----------------------------------------+----------------------------------------------+-------------------------+----------------------------+-----+
The results include the following columns:
-
ml_generate_text_llm_result
: the generated text. -
ml_generate_text_rai_result
: the safety attributes, along with information about whether the content is blocked due to one of the blocking categories. For more information about the safety attributes, see Configure safety filters . -
ml_generate_text_status
: the API response status for the corresponding row. If the operation was successful, this value is empty. -
prompt
: the prompt that is used for the keyword extraction. - All of the columns from the
bigquery-public-data.imdb.reviews
table.
-
Perform sentiment analysis
Perform sentiment analysis on IMDB
movie reviews by using the remote model and the ML.GENERATE_TEXT
function:
-
In the Google Cloud console, go to the BigQuerypage.
-
In the query editor, run the following statement to perform sentiment analysis on five movie reviews:
SELECT ml_generate_text_result [ 'candidates' ][ 0 ][ 'content' ] AS generated_text , * EXCEPT ( ml_generate_text_result ) FROM ML . GENERATE_TEXT ( MODEL `bqml_tutorial.gemini_model` , ( SELECT CONCAT ( 'perform sentiment analysis on the following text, return one the following categories: positive, negative: ' , review ) AS prompt , * FROM `bigquery-public-data.imdb.reviews` LIMIT 5 ), STRUCT ( 0.2 AS temperature , 100 AS max_output_tokens ));
The output is similar to the following, with non-generated columns omitted for clarity:
+--------------------------------------------+-------------------------+----------------------------+-----+ | generated_text | ml_generate_text_status | prompt | ... | +--------------------------------------------+-------------------------+----------------------------+-----+ | {"parts":[{"text":"## Sentiment Analysis: | | perform sentiment analysis | | | Negative \n\nThis text expresses a | | on the following text, | | | strongly negative sentiment towards the | | return one the following | | | movie. Here's why:\n\n* **Negative | | negative: I had to see | | | like \"terribly,\" \"dumb,\" and | | this on the British | | | \"not even\" to describe the acting... | | Airways plane. It was... | | +--------------------------------------------+-------------------------+----------------------------+-----+ | {"parts":[{"text":"## Sentiment Analysis: | | perform sentiment analysis | | | Negative \n\nThis review expresses a | | on the following text, | | | predominantly negative sentiment towards | | return one the following | | | the movie \"The Real Howard Spitz.\" | | categories: positive, | | | Here's why:\n\n* **Criticism of the film's | | negative: This is a family | | | premise:** The reviewer finds it strange | | movie that was broadcast | | | that a film about a children's author... | | on my local ITV station... | | +--------------------------------------------+-------------------------+----------------------------+-----+
The results include the same columns documented for Perform keyword extraction .
Clean up
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete .
- In the dialog, type the project ID, and then click Shut down to delete the project.