Batch inference for BigQuery

This page describes how to get batch inferences using BigQuery.

1. Prepare your inputs

BigQuery storage input

Your service account must have have appropriate BigQuery permissions. To grant the service account the BigQuery Userrole, use the gcloud iam service-accounts add-iam-policy-binding command as follows:

gcloud projects add-iam-policy-binding PROJECT_ID 
\
        --member="serviceAccount: SERVICE_ACCOUNT_ID 
@ PROJECT_ID 
.iam.gserviceaccount.com" \
        --role="roles/bigquery.user"

Replace the following values:

PROJECT_ID : The project that your service account was created in.
SERVICE_ACCOUNT_ID : The ID for the service account.
A request column is required, and must be valid JSON . This JSON data represents your input for the model.
The content in the request column must match the structure of a GenerateContentRequest . + Your input table can have column data types other than request . These columns can have BigQuery data types except for the following: array, struct, range, datetime, and geography. These columns are ignored for content generation but included in the output table.

Example input (JSON)
`{ "contents": [ { "role": "user", "parts": [ { "text": "Give me a recipe for banana bread." } ] } ], "system_instruction": { "parts": [ { "text": "You are a chef." } ] } }`

2. Submit a batch job

You can create a batch job through the Google Cloud console, the Google Gen AI SDK, or the REST API.

The job and your table must be in the same region.

Console

In the Vertex AI section of the Google Cloud console, go to the Batch Inference page.
Go to Batch Inference
Click Create.

REST

To create a batch inference job, use the projects.locations.batchPredictionJobs.create method.

Before using any of the request data, make the following replacements:

LOCATION : A region that supports Gemini models.
PROJECT_ID : Your project ID .
MODEL_PATH : the publisher model name, for example, publishers/google/models/gemini-2.0-flash-001 ; or the tuned endpoint name, for example, projects/ PROJECT_ID /locations/ LOCATION /models/ MODEL_ID , where MODEL_ID is the model ID of the tuned model.
INPUT_URI : The BigQuery table where your batch inference input is located such as bq://myproject.mydataset.input_table . The dataset must be located in the same region as the batch inference job. Multi-region datasets are not supported.
OUTPUT_FORMAT : To output to a BigQuery table, specify bigquery . To output to a Cloud Storage bucket, specify jsonl .
DESTINATION : For BigQuery, specify bigqueryDestination . For Cloud Storage, specify gcsDestination .
OUTPUT_URI_FIELD_NAME : For BigQuery, specify outputUri . For Cloud Storage, specify outputUriPrefix .
OUTPUT_URI : For BigQuery, specify the table location such as bq://myproject.mydataset.output_result . The region of the output BigQuery dataset must be the same as the Vertex AI batch inference job. For Cloud Storage, specify the bucket and directory location such as gs://mybucket/path/to/output .

HTTP method and URL:

POST https:// LOCATION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/batchPredictionJobs

Request JSON body:

{
  "displayName": "my-bigquery-batch-inference-job",
  "model": " MODEL_PATH 
",
  "inputConfig": {
    "instancesFormat": "bigquery",
    "bigquerySource":{
      "inputUri" : " INPUT_URI 
"
    }
  },
  "outputConfig": {
    "predictionsFormat": " OUTPUT_FORMAT 
",
    " DESTINATION 
": {
      " OUTPUT_URI_FIELD_NAME 
": " OUTPUT_URI 
"
    }
  }
}

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell , which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list .

Save the request body in a file named request.json , and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https:// LOCATION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/batchPredictionJobs"

PowerShell

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list .

Save the request body in a file named request.json , and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https:// LOCATION 
-aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/batchPredictionJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Response

{
  "name": "projects/ PROJECT_ID 
/locations/ LOCATION 
/batchPredictionJobs/ BATCH_JOB_ID 
",
  "displayName": "my-bigquery-batch-inference-job",
  "model": "publishers/google/models/gemini-2.5-flash",
  "inputConfig": {
    "instancesFormat": "bigquery",
    "bigquerySource": {
      "inputUri" : " INPUT_URI 
"
    }
  },
  "outputConfig": {
    "predictionsFormat": " OUTPUT_FORMAT 
",
    " DESTINATION 
": {
      " OUTPUT_URI_FIELD_NAME 
": " OUTPUT_URI 
"
    }
  },
  "state": "JOB_STATE_PENDING",
  "createTime": "2024-10-16T19:33:59.153782Z",
  "updateTime": "2024-10-16T19:33:59.153782Z",
  "modelVersionId": "1"
}

The response includes a unique identifier for the batch job. You can poll for the status of the batch job using the BATCH_JOB_ID . For more information, see Monitor the job status . Note: Custom Service account, live progress, CMEK, and VPCSC reports are not supported.

Python

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation .

Set environment variables to use the Gen AI SDK with Vertex AI:

 # Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values 
 # with appropriate values for your project. 
 export 
  
 GOOGLE_CLOUD_PROJECT 
 = 
 GOOGLE_CLOUD_PROJECT 
 export 
  
 GOOGLE_CLOUD_LOCATION 
 = 
 global 
 export 
  
 GOOGLE_GENAI_USE_VERTEXAI 
 = 
True

  import 
  
 time 
 from 
  
 google 
  
 import 
 genai 
 from 
  
 google.genai.types 
  
 import 
 CreateBatchJobConfig 
 , 
 JobState 
 , 
 HttpOptions 
 client 
 = 
 genai 
 . 
 Client 
 ( 
 http_options 
 = 
 HttpOptions 
 ( 
 api_version 
 = 
 "v1" 
 )) 
 # TODO(developer): Update and un-comment below line 
 # output_uri = f"bq://your-project.your_dataset.your_table" 
 job 
 = 
 client 
 . 
 batches 
 . 
 create 
 ( 
 # To use a tuned model, set the model param to your tuned model using the following format: 
 # model="projects/{PROJECT_ID}/locations/{LOCATION}/models/{MODEL_ID} 
 model 
 = 
 "gemini-2.5-flash" 
 , 
 src 
 = 
 "bq://storage-samples.generative_ai.batch_requests_for_multimodal_input" 
 , 
 config 
 = 
 CreateBatchJobConfig 
 ( 
 dest 
 = 
 output_uri 
 ), 
 ) 
 print 
 ( 
 f 
 "Job name: 
 { 
 job 
 . 
 name 
 } 
 " 
 ) 
 print 
 ( 
 f 
 "Job state: 
 { 
 job 
 . 
 state 
 } 
 " 
 ) 
 # Example response: 
 # Job name: projects/.../locations/.../batchPredictionJobs/9876453210000000000 
 # Job state: JOB_STATE_PENDING 
 # See the documentation: https://googleapis.github.io/python-genai/genai.html#genai.types.BatchJob 
 completed_states 
 = 
 { 
 JobState 
 . 
 JOB_STATE_SUCCEEDED 
 , 
 JobState 
 . 
 JOB_STATE_FAILED 
 , 
 JobState 
 . 
 JOB_STATE_CANCELLED 
 , 
 JobState 
 . 
 JOB_STATE_PAUSED 
 , 
 } 
 while 
 job 
 . 
 state 
 not 
 in 
 completed_states 
 : 
 time 
 . 
 sleep 
 ( 
 30 
 ) 
 job 
 = 
 client 
 . 
 batches 
 . 
 get 
 ( 
 name 
 = 
 job 
 . 
 name 
 ) 
 print 
 ( 
 f 
 "Job state: 
 { 
 job 
 . 
 state 
 } 
 " 
 ) 
 # Example response: 
 # Job state: JOB_STATE_PENDING 
 # Job state: JOB_STATE_RUNNING 
 # Job state: JOB_STATE_RUNNING 
 # ... 
 # Job state: JOB_STATE_SUCCEEDED

3. Monitor the job status and progress

After the job is submitted, you can check the status of your batch job using API, SDK and Cloud Console.

Console

Go to the Batch Inference page.
Go to Batch Inference
Select your batch job to monitor its progress.

REST

To monitor a batch inference job, use the projects.locations.batchPredictionJobs.get method and view the CompletionStats field in the response.

Before using any of the request data, make the following replacements:

ENDPOINT_PREFIX : The region of the model resource followed by - . For example, us-central1- . If using the global endpoint, leave blank. Note: The global endpoint isn't supported for batch inference using tuned models.
LOCATION : A region that supports Gemini models. If using the global endpoint, enter global .
PROJECT_ID : Your project ID.
BATCH_JOB_ID : Your batch job ID.

HTTP method and URL:

GET https:// ENDPOINT_PREFIX 
aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/batchPredictionJobs/ BATCH_JOB_ID

To send your request, choose one of these options:

curl

Execute the following command:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https:// ENDPOINT_PREFIX 
aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/batchPredictionJobs/ BATCH_JOB_ID 
"

PowerShell

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https:// ENDPOINT_PREFIX 
aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ LOCATION 
/batchPredictionJobs/ BATCH_JOB_ID 
" | Select-Object -Expand Content

You should receive a JSON response similar to the following.

Response

{
  "name": "projects/ PROJECT_ID 
/locations/ LOCATION 
/batchPredictionJobs/ BATCH_JOB_ID 
",
  "displayName": "my-cloud-storage-batch-prediction-job",
  "model": "publishers/google/models/gemini-2.5-flash",
  ...
  "state": "JOB_STATE_PENDING",
  "completionStats": {
    "successfulCount": "10",
    "failedCount": "1"
  },
  "createTime": "2024-10-16T19:33:59.153782Z",
  "updateTime": "2024-10-16T19:33:59.153782Z",
  "modelVersionId": "1"
}

Python

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation .

Set environment variables to use the Gen AI SDK with Vertex AI:

 # Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values 
 # with appropriate values for your project. 
 export 
  
 GOOGLE_CLOUD_PROJECT 
 = 
 GOOGLE_CLOUD_PROJECT 
 export 
  
 GOOGLE_CLOUD_LOCATION 
 = 
 global 
 export 
  
 GOOGLE_GENAI_USE_VERTEXAI 
 = 
True

  from 
  
 google 
  
 import 
 genai 
 from 
  
 google.genai.types 
  
 import 
 HttpOptions 
 client 
 = 
 genai 
 . 
 Client 
 ( 
 http_options 
 = 
 HttpOptions 
 ( 
 api_version 
 = 
 "v1" 
 )) 
 # Get the batch job 
 # Eg. batch_job_name = "projects/123456789012/locations/.../batchPredictionJobs/1234567890123456789" 
 batch_job 
 = 
 client 
 . 
 batches 
 . 
 get 
 ( 
 name 
 = 
 batch_job_name 
 ) 
 print 
 ( 
 f 
 "Job state: 
 { 
 batch_job 
 . 
 state 
 } 
 " 
 ) 
 # Example response: 
 # Job state: JOB_STATE_PENDING 
 # Job state: JOB_STATE_RUNNING 
 # Job state: JOB_STATE_SUCCEEDED

The status of the a given batch job can be any of the following:

JOB_STATE_PENDING : Queue for capacity. The job can be in queue state up to 72-hour before entering running state.
JOB_STATE_RUNNING : The input file was successfully validated and the batch is currently being run.
JOB_STATE_SUCCEEDED : The batch has been completed and the results are ready
JOB_STATE_FAILED : the input file has failed the validation process, or could not be completed within the 24-hour time window after entering RUNNING state.
JOB_STATE_CANCELLING : the batch is being cancelled
JOB_STATE_CANCELLED : the batch was cancelled

4. Retrieve batch output

When a batch inference task completes, the output is stored in the BigQuery table that you specified in your request.

For succeeded rows, model responses are stored in the response column. Otherwise, error details are stored in the status column for further inspection.

Output example

Successful example

  { 
  
 "candidates" 
 : 
  
 [ 
  
 { 
  
 "content" 
 : 
  
 { 
  
 "role" 
 : 
  
 "model" 
 , 
  
 "parts" 
 : 
  
 [ 
  
 { 
  
 "text" 
 : 
  
 "In a medium bowl, whisk together the flour, baking soda, baking powder." 
  
 } 
  
 ] 
  
 }, 
  
 "finishReason" 
 : 
  
 "STOP" 
 , 
  
 "safetyRatings" 
 : 
  
 [ 
  
 { 
  
 "category" 
 : 
  
 "HARM_CATEGORY_SEXUALLY_EXPLICIT" 
 , 
  
 "probability" 
 : 
  
 "NEGLIGIBLE" 
 , 
  
 "probabilityScore" 
 : 
  
 0.14057204 
 , 
  
 "severity" 
 : 
  
 "HARM_SEVERITY_NEGLIGIBLE" 
 , 
  
 "severityScore" 
 : 
  
 0.14270912 
  
 } 
  
 ] 
  
 } 
  
 ], 
  
 "usageMetadata" 
 : 
  
 { 
  
 "promptTokenCount" 
 : 
  
 8 
 , 
  
 "candidatesTokenCount" 
 : 
  
 396 
 , 
  
 "totalTokenCount" 
 : 
  
 404 
  
 } 
 }

Failed example

Request

  { 
 "contents" 
 :[{ 
 "parts" 
 :{ 
 "text" 
 : 
 "Explain how AI works in a few words." 
 }, 
 "role" 
 : 
 "tester" 
 }]}

Response

  Bad 
  
 Reques 
 t 
 : 
  
 { 
 "error" 
 : 
  
 { 
 "code" 
 : 
  
 400 
 , 
  
 "message" 
 : 
  
 "Please use a valid role: user, model." 
 , 
  
 "status" 
 : 
  
 "INVALID_ARGUMENT" 
 }}

Batch inference for BigQuery Stay organized with collections Save and categorize content based on your preferences.

1. Prepare your inputs

BigQuery storage input

2. Submit a batch job

Console

REST

curl

PowerShell

Response

Python

Install

3. Monitor the job status and progress

Console

REST

curl

PowerShell

Response

Python

Install

4. Retrieve batch output

Output example

Batch inference for BigQuery