The AI.IF function

This document describes the AI.IF function, which uses a Vertex AI Gemini model to evaluate a condition described in natural language and returns a BOOL .

You can use the AI.IF function to filter and join data based on conditions described in natural language or multimodal input. The following are common use cases:

  • Sentiment analysis: Find customer reviews with negative sentiment.
  • Topic analysis: Identify news articles related to a specific subject.
  • Image analysis: Select images that contain a specific item.
  • Security: Identify suspicious emails.

Input

AI.IF accepts the following types of input:

When you analyze unstructured data, that data must meet the following requirements:

This function passes your input to a Gemini model and incurs charges in Vertex AI each time it's called. For information about how to view these charges, see Track costs .

Syntax

 AI 
 . 
 IF 
 ( 
  
 [ 
  
 prompt 
  
 => 
  
 ] 
  
  PROMPT 
 
  
 [ 
 , 
  
 examples 
  
 => 
  
  EXAMPLES 
 
  
 ] 
  
 [ 
 , 
  
 connection_id 
  
 => 
  
 ' CONNECTION 
' 
  
 ] 
  
 [ 
 , 
  
 endpoint 
  
 => 
  
 ' ENDPOINT 
' 
  
 ] 
  
 [ 
 , 
  
 embeddings 
  
 => 
  
  EMBEDDINGS 
 
  
 ] 
  
 [ 
 , 
  
 optimization_mode 
  
 => 
  
 ' OPTIMIZATION_MODE 
' 
  
 ] 
  
 [ 
 , 
  
 max_error_ratio 
  
 => 
  
  MAX_ERROR_RATIO 
 
  
 ] 
 ) 

Arguments

AI.IF takes the following arguments.

  • PROMPT : a STRING or STRUCT value that specifies the PROMPT value to send to the model. The prompt must be the first argument that you specify. You can provide the value in the following ways:
    • Specify a STRING value. For example, 'This is a prompt.'
    • Specify a STRUCT value that contains one or more fields. You can use the following types of fields within the STRUCT value:
      Field type Description Examples
      STRING
      or
      ARRAY<STRING>
      A string literal, array of string literals, or the name of a STRING column. String literal:
      'This is a prompt.'

      String column name:
      my_string_column
      ObjectRefRuntime
      or
      ARRAY<ObjectRefRuntime>

      An ObjectRefRuntime value returned by the OBJ.GET_ACCESS_URL function . The OBJ.GET_ACCESS_URL function takes an ObjectRef value as input, which you can provide by either specifying the name of a column that contains ObjectRef values, or by constructing an ObjectRef value.

      ObjectRefRuntime values must have the access_url.read_url and details.gcs_metadata.content_type elements of the JSON value populated.

      Your input can contain at most one video object.

      Function call with ObjectRef column:
      OBJ.GET_ACCESS_URL(my_objectref_column, 'r')

      Function call with constructed ObjectRef value:
      OBJ.GET_ACCESS_URL(OBJ.MAKE_REF('gs://image.jpg', 'myconnection'), 'r')
      The function combines STRUCT fields similarly to a CONCAT operation and concatenates the fields in their specified order. The same is true for the elements of any arrays used within the struct. The following table shows some examples of STRUCT prompt values and how they are interpreted:
      Struct field types Struct value Semantic equivalent
      STRUCT<STRING, STRING, STRING>
      ('Describe the city of ', my_city_column, ' in 15 words') 'Describe the city of my_city_column_value in 15 words'
      STRUCT<STRING, ObjectRefRuntime>
      ('Describe the following city', OBJ.GET_ACCESS_URL(image_objectref_column, 'r')) 'Describe the following city image '
  • EXAMPLES : an ARRAY<STRUCT<STRING, BOOL>> value that contains representative examples of input strings and the output truth value that you expect. You can provide examples to help the model understand your intended threshold for a condition with nuanced or subjective logic. We recommend that you provide at most 5 examples.

  • CONNECTION : a STRING value specifying the connection to use to communicate with the model, in the format [ PROJECT_ID ]. LOCATION . CONNECTION_ID . For example, myproject.us.myconnection .

    If you don't specify a connection, then the query uses your end-user credentials .

    For information about configuring permissions, see Set permissions for BigQuery ML generative AI functions that call Vertex AI models .

  • ENDPOINT : a STRING value that specifies the Vertex AI endpoint to use for the model. You can specify any generally available or preview Gemini model. If you specify the model name, BigQuery ML automatically identifies and uses the full endpoint of the model. If you don't specify an ENDPOINT value, BigQuery ML dynamically chooses a model based on your query to have the best cost to quality tradeoff for the task. You can also specify the global endpoint :

     https:// aiplatform.googleapis.com/v1/projects/ PROJECT_ID 
    /locations/ global/publishers/google/models/ GEMINI_ENDPOINT 
     
    
  • EMBEDDINGS : the embeddings to use for optimized mode (Preview). This argument is optional. If you don't specify this argument, then the query uses standard LLM inference for all rows unless the table has autonomous embedding generation enabled.

    This argument accepts the following data types:

    • ARRAY<FLOAT64> : use this for a single column reference.
    • ARRAY<STRUCT<STRING, ARRAY<FLOAT64>>> : use this to map multiple columns to their corresponding embeddings. For example: [STRUCT('title', title_embedding), STRUCT('body', body_embedding)] .
    • ARRAY<STRUCT<ARRAY<STRING>, ARRAY<FLOAT64>>> : use this for advanced mapping scenarios.
  • OPTIMIZATION_MODE : a STRING value that specifies the optimization strategy to use. Supported values are as follows:

    • MINIMIZE_COST (default): uses a local, distilled model to process the majority of rows, reducing latency and cost. This mode requires input embeddings and that the input to the AI function contain approximately 3,000 rows to ensure enough data for model training.
    • MAXIMIZE_QUALITY : always uses the remote LLM for inference.
  • MAX_ERROR_RATIO : a FLOAT64 value between 0.0 and 1.0 that contains the maximum acceptable ratio of row-level inference failures to rows processed on this function. If this value is exceeded, then the query fails and BigQuery returns an error message that describes the most frequent types of errors. For example, if the value is 0.3 then the query fails if more than 30% of rows processed have failed to return results. If max_error_ratio is set for multiple functions, the query fails if the ratio is exceeded on any function. The default value is 1.0 . However, the query still fails if inference fails for every row. This argument isn't supported when optimization_mode is set to MINIMIZE_COST .

Output

AI.IF returns a BOOL based on evaluation of the condition in the input prompt.

If the call to Vertex AI is unsuccessful for any reason, such as exceeding quota or model unavailability, then the function returns NULL .

Examples

The following examples show how to use the AI.IF function to filter text and join multimodal data.

Filter text by topic

The following query uses the AI.IF function to filter news stories to those that cover a natural disaster:

  SELECT 
  
 title 
 , 
  
 body 
 FROM 
  
 `bigquery-public-data.bbc_news.fulltext` 
 WHERE 
  
 AI 
 . 
 IF 
 (( 
 'The following news story is about a natural disaster: ' 
 , 
  
 body 
 )); 
 

The result is similar to the following:

 +----------------------------------+---------------------------------------------+
| title                            | body                                        |
+----------------------------------+---------------------------------------------+
| Tsunami 'to hit Sri Lanka banks' | Sri Lanka's banks face hard times following |
|                                  | December's tsunami disaster...              |
| ...                              | ...                                         |
+----------------------------------+---------------------------------------------+ 

Filter text by topic with optimized mode

The following query uses AI.IF to find news that covers a natural disaster using optimized mode (Preview):

  SELECT 
  
 title 
 , 
  
 body 
 FROM 
  
 `bigquery-public-data.bbc_news.fulltext` 
 WHERE 
  
 AI 
 . 
 IF 
 ( 
  
 ( 
 'The following news story is about a natural disaster: ' 
 , 
  
 body 
 ), 
  
 embeddings 
  
 = 
>  
 AI 
 . 
 EMBED 
 ( 
 body 
 , 
  
 endpoint 
  
 = 
>  
 'text-embedding-005' 
 , 
  
 task_type 
  
 = 
>  
 'CLASSIFICATION' 
 ). 
 result 
 , 
  
 optimization_mode 
  
 = 
>  
 'MINIMIZE_COST' 
  
 ); 
 

For this example, embeddings are generated on-the-fly. In practice, we recommend that you materialize embeddings so that they can be reused. For more information, see Optimize AI function costs .

Use a subjective condition

The following query provides examples to the AI.IF function to show what counts as an emotional review. Because "emotion" is a subjective quality, you might provide different expected results in different contexts.

  SELECT 
  
 review 
 , 
  
 AI 
 . 
 IF 
 ( 
  
 ( 
 "The review is emotional:" 
 , 
  
 review 
 ), 
  
 examples 
  
 = 
>  
 [ 
  
 ( 
 "I really love this product" 
 , 
  
 TRUE 
 ), 
  
 ( 
 "The product performed extremely well" 
 , 
  
 FALSE 
 ) 
 ] 
 ) 
  
 AS 
  
 is_emotional 
 FROM 
  
 ( 
  
 SELECT 
  
 "This product did everything it was supposed to" 
  
 AS 
  
 review 
  
 UNION 
  
 ALL 
  
 SELECT 
  
 "This product was absolutely incredible!!" 
  
 AS 
  
 review 
 ); 
 

The result is similar to the following:

 +------------------------------------------------+--------------+
| review                                         | is_emotional |
+------------------------------------------------+--------------+
| This product was absolutely incredible!!       | true         |
| This product did everything it was supposed to | false        |
+------------------------------------------------+--------------+ 

Filter images

The following query creates an external table from images of pet products stored in a publicly available Cloud Storage bucket. Then, it filters the results to images that contain a ball.

  -- Create a dataset 
 CREATE 
  
 SCHEMA 
  
 IF 
  
 NOT 
  
 EXISTS 
  
 cymbal_pets 
 ; 
 -- Create an object table 
 CREATE 
  
 OR 
  
 REPLACE 
  
 EXTERNAL 
  
 TABLE 
  
 cymbal_pets 
 . 
 product_images 
 WITH 
  
 CONNECTION 
  
 us 
 . 
 example_connection 
 OPTIONS 
  
 ( 
  
 object_metadata 
  
 = 
  
 'SIMPLE' 
 , 
  
 uris 
  
 = 
  
 [ 
 'gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/images/*.png' 
 ] 
 ); 
 -- Filter images in the object table 
 SELECT 
  
 OBJ 
 . 
 GET_READ_URL 
 ( 
 ref 
 ) 
  
 AS 
  
 signed_url 
 , 
 FROM 
  
 `cymbal_pets.product_images` 
 WHERE 
  
 AI 
 . 
 IF 
 (( 
 'The image contains a ball.' 
 , 
  
 OBJ 
 . 
 GET_ACCESS_URL 
 ( 
 ref 
 , 
  
 'r' 
 ))); 
 

Join tables based on image content

The following queries create a table of product data and a table of product images. The tables are joined based on whether the image is of the product.

  -- Create a dataset 
 CREATE 
  
 SCHEMA 
  
 IF 
  
 NOT 
  
 EXISTS 
  
 cymbal_pets 
 ; 
 -- Load a non-object table 
 LOAD 
  
 DATA 
  
 OVERWRITE 
  
 cymbal_pets 
 . 
 products 
 FROM 
  
 FILES 
 ( 
  
 format 
  
 = 
  
 'avro' 
 , 
  
 uris 
  
 = 
  
 [ 
  
 'gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/tables/products/products_*.avro' 
 ] 
 ); 
 -- Create an object table 
 CREATE 
  
 OR 
  
 REPLACE 
  
 EXTERNAL 
  
 TABLE 
  
 cymbal_pets 
 . 
 product_images 
  
 WITH 
  
 CONNECTION 
  
 us 
 . 
 example_connection 
  
 OPTIONS 
  
 ( 
  
 object_metadata 
  
 = 
  
 'SIMPLE' 
 , 
  
 uris 
  
 = 
  
 [ 
 'gs://cloud-samples-data/bigquery/tutorials/cymbal-pets/images/*.png' 
 ] 
 ); 
 -- Join the standard table and object table 
 SELECT 
  
 product_name 
 , 
  
 brand 
 , 
  
 signed_url 
 FROM 
  
 cymbal_pets 
 . 
 products 
  
 INNER 
  
 JOIN 
  
 EXTERNAL_OBJECT_TRANSFORM 
 ( 
 TABLE 
  
 `cymbal_pets.product_images` 
 , 
  
 [ 
 'SIGNED_URL' 
 ] 
 ) 
  
 as 
  
 images 
 ON 
  
 AI 
 . 
 IF 
 ( 
  
 ( 
  
 " 
 "" 
 You will be provided an image of a pet product. 
 Determine if the image is of the following pet toy: 
 "" 
 " 
 , 
  
 products 
 . 
 product_name 
 , 
  
 OBJ 
 . 
 GET_ACCESS_URL 
 ( 
 images 
 . 
 ref 
 , 
  
 'r' 
 ) 
  
 ), 
  
 endpoint 
  
 = 
>  
 'gemini-2.5-flash' 
 ) 
 WHERE 
  
 products 
 . 
 category 
  
 = 
  
 "Toys" 
  
 AND 
  
 products 
 . 
 brand 
  
 = 
  
 "Fluffy Buns" 
 ; 
 

Filter audio by speech topic

The following queries create a table of audio data stored in a publicly available Cloud Storage bucket. The query filters the audio samples to those that contain speech discussing a large language model.

  -- Create a dataset 
 CREATE 
  
 SCHEMA 
  
 IF 
  
 NOT 
  
 EXISTS 
  
 audio_repo 
 ; 
 -- Create an object table with audios 
 CREATE 
  
 OR 
  
 REPLACE 
  
 EXTERNAL 
  
 TABLE 
  
 audio_repo 
 . 
 prompt_audio 
 WITH 
  
 CONNECTION 
  
 us 
 . 
 test_connection 
 OPTIONS 
  
 ( 
  
 object_metadata 
  
 = 
  
 'SIMPLE' 
 , 
  
 uris 
  
 = 
  
 [ 
 'gs://cloud-samples-data/generative-ai/audio/*.mp3' 
 ] 
 ); 
 -- Filter audios in the object table 
 SELECT 
  
 OBJ 
 . 
 GET_READ_URL 
 ( 
 ref 
 ) 
  
 AS 
  
 signed_url 
 , 
 FROM 
  
 `audio_repo.prompt_audio` 
 WHERE 
  
 AI 
 . 
 IF 
 (( 
 'Does the audio talk about large language models? ' 
 , 
  
 OBJ 
 . 
 GET_ACCESS_URL 
 ( 
 ref 
 , 
  
 'r' 
 ))); 
 

Handle inference errors

The following query filters news stories but sets max_error_ratio to 0.05 , meaning the query fails if more than 5% of rows return an error during inference:

  SELECT 
  
 title 
 , 
  
 body 
 FROM 
  
 `bigquery-public-data.bbc_news.fulltext` 
 WHERE 
  
 AI 
 . 
 IF 
 (( 
 'The following news story is about a natural disaster: ' 
 , 
  
 body 
 ), 
  
 max_error_ratio 
  
 = 
>  
 0.05 
 ); 
 

If the query exceeds the 0.05 error ratio, it fails and returns an error message similar to the following: Query failed because AI functions exceeded their allowed error ratio

The AI.IF and AI.GENERATE_BOOL functions both use models to generate a boolean value in response to a prompt. The following differences can help you choose which function to use:

  • Prompt Optimization: AI.IF automatically structures your prompts to improve the quality of the output.
  • Input: AI.GENERATE_BOOL lets you specify model parameters to use.
  • Output: AI.IF returns a BOOL value, which makes it easier to work with in queries. AI.GENERATE_BOOL returns a STRUCT that contains a BOOL value, as well as additional information about the model call, which is useful if you need to view details such as the safety rating or API response status.
  • Error handling: If AI.IF produces an error for any input, then the function returns NULL . AI.GENERATE_BOOL records details about the errors in its output.

Locations

You can run AI.IF in all of the regions that support Gemini models, and also in the US and EU multi-regions.

Quotas

See Generative AI functions quotas and limits .

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: