The AI.CLASSIFY function

This document describes the AI.CLASSIFY function, which uses a Vertex AI Gemini model to classify inputs into categories that you provide. BigQuery automatically structures your input to improve the quality of the classification.

The following are common use cases:

Retail: Classify reviews by sentiment or classify products by categories.
Text analysis: Classify support tickets or emails by topic.
Image analysis: Classify an image by its style or contents.

Input

AI.CLASSIFY accepts the following types of input:

Text data from standard tables.
ObjectRefRuntime values that are generated by the OBJ.GET_ACCESS_URL function . You can use ObjectRef values from standard tables as input to the OBJ.GET_ACCESS_URL function.

This function passes your input to a Gemini model and incurs charges in Vertex AI each time it's called. For information about how to view these charges, see Track costs .

Syntax

 AI 
 . 
 CLASSIFY 
 ( 
  
 [ 
  
 input 
  
 => 
  
 ] 
  
 ' INPUT 
' 
 , 
  
 [ 
  
 categories 
  
 => 
  
 ] 
  
 ' CATEGORIES 
' 
 , 
  
 connection_id 
  
 => 
  
 ' CONNECTION 
' 
 )

Arguments

AI.CLASSIFY takes the following arguments:

INPUT : a STRING or STRUCT value that specifies the input to classify. The input must be the first argument that you specify. You can provide the input value in the following ways:

Specify a STRING value. For example, 'apple' .

Specify a STRUCT value that contains one or more fields. You can use the following types of fields within the STRUCT value:

Field type Description Examples

Field type	Description	Examples
`STRING`	A string literal, or the name of a `STRING` column.	String literal: `'apple'` String column name: `my_string_column`
`ARRAY<STRING>`	You can only use string literals in the array.	Array of string literals: `['red ', 'apples']`
`ObjectRefRuntime` or `ARRAY<ObjectRefRuntime>`	An `ObjectRefRuntime` value returned by the `OBJ.GET_ACCESS_URL` function . The `OBJ.GET_ACCESS_URL` function takes an `ObjectRef` value as input, which you can provide by either specifying the name of a column that contains `ObjectRef` values, or by constructing an `ObjectRef` value. `ObjectRefRuntime` values must have the `access_url.read_url` and `details.gcs_metadata.content_type` elements of the JSON value populated.	Function call with `ObjectRef` column: `OBJ.GET_ACCESS_URL(my_objectref_column, 'r')` Function call with constructed `ObjectRef` value: `OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image.jpg', 'myconnection'), 'r')`

STRING

A string literal, or the name of a STRING column.

String literal:
'apple'

String column name:
my_string_column

ARRAY<STRING>

You can only use string literals in the array.

Array of string literals:
['red ', 'apples']

ObjectRefRuntime
or
ARRAY<ObjectRefRuntime>

An ObjectRefRuntime value returned by the OBJ.GET_ACCESS_URL function . The OBJ.GET_ACCESS_URL function takes an ObjectRef value as input, which you can provide by either specifying the name of a column that contains ObjectRef values, or by constructing an ObjectRef value.

ObjectRefRuntime values must have the access_url.read_url and details.gcs_metadata.content_type elements of the JSON value populated.

Function call with ObjectRef column:
OBJ.GET_ACCESS_URL(my_objectref_column, 'r')

Function call with constructed ObjectRef value:
OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image.jpg', 'myconnection'), 'r')

The function combines STRUCT fields similarly to a CONCAT operation and concatenates the fields in their specified order. The same is true for the elements of any arrays used within the struct. The following table shows some examples of STRUCT prompt values and how they are interpreted:

Struct field types	Struct value	Semantic equivalent
`STRUCT<STRING, STRING>`	`('red', ' apples')`	'red apples'
`STRUCT<STRING, ARRAY<STRING>>`	`('crisp ', ['red', ' apples'])`	'crisp red apples'
`STRUCT<STRING, ObjectRefRuntime, ObjectRefRuntime>`	`('Classify city by size in country:', OBJ.GET_ACCESS_URL(city_image_objectref_column, 'r'), OBJ.GET_ACCESS_URL(country_image_objectref_column, 'r'))`	'Classify city by size in country:' `city_image` `country_image`

CATEGORIES : the categories by which to classify the input. You can specify categories with or without descriptions:
- With descriptions: Use an ARRAY<STRUCT<STRING, STRING>> value where each struct contains the category name, followed by a description of the category. The array can only contain string literals. For example, you could use colors to classify sentiment:
  
  [('green', 'positive'), ('yellow', 'neutral'), ('red', 'negative')]
  
  You can optionally name the fields of the struct for your own readability, but the field names aren't used by the function:
```
 [STRUCT('green' AS label, 'positive' AS description),
   STRUCT('yellow' AS label, 'neutral' AS description),
   STRUCT('red' AS label, 'negative' AS description)] 
```
- Without descriptions: Use an ARRAY<STRING> value. The array can only contain string literals. This works well when your categories are self-explanatory. For example, you could use the following categories to classify sentiment:
  
  ['positive', 'neutral', 'negative']
To handle input that doesn't closely match a category, consider including an 'Other' category.
CONNECTION : a STRING value specifying the Cloud resource connection to use. The following forms are accepted:
- Connection name: [ PROJECT_ID ]. LOCATION . CONNECTION_ID
  
  For example, myproject.us.myconnection .
- Fully qualified connection ID: projects/ PROJECT_ID /locations/ LOCATION /connections/ CONNECTION_ID
  
  For example, projects/myproject/locations/us/connections/myconnection .
Replace the following:
- PROJECT_ID : the project ID of the project that contains the connection.
- LOCATION : the location used by the connection.
- CONNECTION_ID : the connection ID—for example, myconnection .
  You can get this value by viewing the connection details in the Google Cloud console and copying the value in the last section of the fully qualified connection ID that is shown in Connection ID. For example, projects/myproject/locations/connection_location/connections/ myconnection .
Important

You need to grant the Vertex AI User role to the connection's service account. For more information, see Grant or revoke a single IAM role .

Output

AI.CLASSIFY returns a STRING value containing the provided category that best fits the input.

If the call to Vertex AI is unsuccessful for any reason, such as exceeding quota or model unavailability, then the function returns NULL .

Examples

The following examples show how to use the AI.CLASSIFY function to classify text and images into predefined categories.

Classify text by topic

The following query categorizes BBC news articles into high-level categories:

  SELECT 
  
 title 
 , 
  
 body 
 , 
  
 AI 
 . 
 CLASSIFY 
 ( 
  
 body 
 , 
  
 categories 
  
 = 
>  
 [ 
 'tech' 
 , 
  
 'sport' 
 , 
  
 'business' 
 , 
  
 'politics' 
 , 
  
 'entertainment' 
 , 
  
 'other' 
 ] 
 , 
  
 connection_id 
  
 = 
>  
 'us.example_connection' 
 ) 
  
 AS 
  
 category 
 FROM 
  
 `bigquery-public-data.bbc_news.fulltext` 
 LIMIT 
  
 100 
 ;

Classify reviews by sentiment

The following query classifies movie reviews of The English Patient by sentiment according to a custom color scheme. For example, a review that is very positive is classified as 'green'.

  SELECT 
  
 AI 
 . 
 CLASSIFY 
 ( 
  
 ( 
 'Classify the review by sentiment: ' 
 , 
  
 review 
 ), 
  
 categories 
  
 = 
>  
 [ 
 ( 
 'green' 
 , 
  
 'The review is positive.' 
 ), 
  
 ( 
 'yellow' 
 , 
  
 'The review is neutral.' 
 ), 
  
 ( 
 'red' 
 , 
  
 'The review is negative.' 
 ) 
 ] 
 , 
  
 connection_id 
  
 = 
>  
 'us.example_connection' 
 ) 
  
 AS 
  
 ai_review_rating 
 , 
  
 reviewer_rating 
  
 AS 
  
 human_provided_rating 
 , 
  
 review 
 , 
 FROM 
  
 `bigquery-public-data.imdb.reviews` 
 WHERE 
  
 title 
  
 = 
  
 'The English Patient'

Classify images by type

The following query creates an external table from images of pet products stored in a publicly available Cloud Storage bucket. Then, it classifies each image as a box, ball, bottle, stand, or other type of item.

  -- Create a dataset 
 CREATE 
  
 SCHEMA 
  
 IF 
  
 NOT 
  
 EXISTS 
  
 cymbal_pets 
 ; 
 -- Create an object table 
 CREATE 
  
 OR 
  
 REPLACE 
  
 EXTERNAL 
  
 TABLE 
  
 cymbal_pets 
 . 
 product_images 
 WITH 
  
 CONNECTION 
  
 us 
 . 
 example_connection 
 OPTIONS 
  
 ( 
  
 object_metadata 
  
 = 
  
 'SIMPLE' 
 , 
  
 uris 
  
 = 
  
 [ 
 'gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/images/*.png' 
 ] 
 ); 
 -- Classify images in the object table 
 SELECT 
  
 signed_url 
 , 
  
 AI 
 . 
 CLASSIFY 
 ( 
  
 images 
 . 
 ref 
 , 
  
 [ 
 'box' 
 , 
  
 'ball' 
 , 
  
 'bottle' 
 , 
  
 'stand' 
 , 
  
 'other' 
 ] 
 , 
  
 connection_id 
  
 = 
>  
 'us.example_connection' 
 ) 
  
 AS 
  
 category 
 FROM 
  
 EXTERNAL_OBJECT_TRANSFORM 
 ( 
 TABLE 
  
 `cymbal_pets.product_images` 
 , 
  
 [ 
 'SIGNED_URL' 
 ] 
 ) 
  
 AS 
  
 images 
 LIMIT 
  
 10 
 ;

Locations

You can run AI.CLASSIFY in all of the regions that support Gemini models, and also in the US and EU multi-regions.

Quotas

See Generative AI functions quotas and limits .

What's next

For more information about using Vertex AI models to generate text and embeddings, see Generative AI overview .
For more information about using Cloud AI APIs to perform AI tasks, see AI application overview .
For more information about supported SQL statements and functions for generative AI models, see End-to-end user journeys for generative AI models .
To use this function in a tutorial, see Perform semantic analysis with managed AI functions .