Machine learning functions in GoogleSQL

GoogleSQL for Spanner supports the following machine learning (ML) functions.

Function list

Name Summary
AI.CLASSIFY Classifies a natural language input into one of specified categories.
AI.IF Evaluates a natural language condition.
AI.SCORE Rates a natural language input and assigns it a score.
ML.PREDICT Applies ML computations defined by a model to each row of an input relation.

AI.CLASSIFY

  AI 
 . 
 CLASSIFY 
 ( 
 prompt 
 , 
  
 categories 
 ) 
 

Description

Classifies the input prompt value into categories that you provide.

The following are common use cases:

  • Retail: Classify reviews by sentiment or classify products by categories.
  • Text analysis: Classify support tickets or emails by topic.

Definitions

  • prompt : A STRING value that specifies the input to classify. For example, 'apple' .
  • categories : The categories by which to classify the input. You can specify categories with or without descriptions:

    • With descriptions: Use an ARRAY<STRUCT<STRING, STRING>> value where each struct contains the category name, followed by a description of the category. The array can only contain string literals. For example, you can use colors to classify sentiment:

        [ 
       ( 
       'green' 
       , 
        
       'positive' 
       ), 
        
       ( 
       'yellow' 
       , 
        
       'neutral' 
       ), 
        
       ( 
       'red' 
       , 
        
       'negative' 
       ) 
       ] 
       
      

      You can optionally name the fields of the struct for your own readability, but the field names aren't used by the function:

        [ 
       STRUCT 
       ( 
       'green' 
        
       AS 
        
       label 
       , 
        
       'positive' 
        
       AS 
        
       description 
       ), 
       STRUCT 
       ( 
       'yellow' 
        
       AS 
        
       label 
       , 
        
       'neutral' 
        
       AS 
        
       description 
       ), 
       STRUCT 
       ( 
       'red' 
        
       AS 
        
       label 
       , 
        
       'negative' 
        
       AS 
        
       description 
       ) 
       ] 
       
      
    • Without descriptions: Use an ARRAY<STRING> value. The array can only contain string literals. This works well when your categories are self-explanatory. For example, you can use the following categories to classify sentiment:

      ['positive', 'neutral', 'negative']

    To handle input that doesn't closely match a category, consider including an 'Other' category.

Details

AI.CLASSIFY returns a STRING value containing the provided category that best fits the input.

The prompt value is evaluated using Vertex AI Gemini LLM in the same project as the database. Enable the API before calling this function. pricing applies.

If the call to is unsuccessful for any reason, such as exceeding quota or model unavailability, then the function produces an error. In safe mode, the function returns NULL instead.

Return Type

STRING

Examples

The following query categorizes news articles into high-level categories:

  -- Classify text by topic 
 SELECT 
  
 title 
 , 
  
 body 
 , 
  
 AI 
 . 
 CLASSIFY 
 ( 
  
 body 
 , 
  
 [ 
 'tech' 
 , 
  
 'sport' 
 , 
  
 'business' 
 , 
  
 'politics' 
 , 
  
 'entertainment' 
 , 
  
 'other' 
 ] 
 ) 
  
 AS 
  
 category 
 FROM 
  
 news 
 LIMIT 
  
 100 
 ; 
 

The following query classifies movie reviews of The English Patient by sentiment according to a custom color scheme. For example, a review that is positive is classified as 'green'.

  -- Classify reviews by sentiment 
 SELECT 
  
 AI 
 . 
 CLASSIFY 
 ( 
  
 ( 
 'Classify the review by sentiment: ' 
 , 
  
 review 
 ), 
  
 categories 
  
 = 
>  
 [ 
 ( 
 'green' 
 , 
  
 'The review is positive.' 
 ), 
  
 ( 
 'yellow' 
 , 
  
 'The review is neutral.' 
 ), 
  
 ( 
 'red' 
 , 
  
 'The review is negative.' 
 ) 
 ] 
 ) 
  
 AS 
  
 ai_review_rating 
 , 
  
 reviewer_rating 
  
 AS 
  
 human_provided_rating 
 , 
  
 review 
 , 
 FROM 
  
 reviews 
 WHERE 
  
 title 
  
 = 
  
 'The English Patient' 
 

AI.IF

  AI 
 . 
 IF 
 ( 
 prompt 
 ) 
 

Description

Evaluates a condition described in natural language and returns a BOOL .

You can use the AI.IF function to filter and join data based on conditions described in natural language or multimodal input. The following are common use cases:

  • Sentiment analysis: Find customer reviews with negative sentiment.
  • Topic analysis: Identify news articles related to a specific subject.
  • Image analysis: Select images that contain a specific item.
  • Security: Identify suspicious emails.

Definitions

  • prompt : A STRING value that specified the prompt to send to the model. For example, 'Is Seattle a US city?' .

Details

The prompt value is evaluated using Vertex AI Gemini LLM in the same project as the database. Enable the API before calling this function. pricing applies.

If the call to is unsuccessful for any reason, such as exceeding quota or model unavailability, then the function produces an error. In safe mode, the function returns NULL instead.

Return Type

BOOL

Examples

The following query uses the AI.IF function to filter news stories to those that cover a natural disaster:

  -- Filter text by topic 
 SELECT 
  
 title 
 , 
  
 body 
 FROM 
  
 news 
 WHERE 
  
 AI 
 . 
 IF 
 ( 
 CONCAT 
 ( 
  
 'The following news story is about a natural disaster: ' 
 , 
  
 body 
 )); 
 

The following query first filters by the equality operator, and then filters using the AI.IF function:

  -- Combine filters 
 SELECT 
  
 title 
 , 
  
 body 
 FROM 
  
 news 
 WHERE 
  
 AI 
 . 
 IF 
 ( 
 CONCAT 
 ( 
 'This news is related to Google: ' 
 , 
  
 body 
 )) 
  
 AND 
  
 category 
  
 = 
  
 'tech' 
 ; 
  
 -- Non-AI filters are evaluated first 
 

The following query joins tables based on whether the news is about a product:

  -- Join tables based on semantic understanding. 
 SELECT 
  
 news 
 . 
 title 
 , 
  
 news 
 . 
 body 
 , 
  
 products 
 . 
 product_name 
 FROM 
  
 news 
 JOIN 
  
 products 
  
 ON 
  
 AI 
 . 
 IF 
 ( 
  
 CONCAT 
 ( 
  
 'Determine if the following news story is about product ' 
 , 
  
 products 
 . 
 product_name 
 , 
  
 ': ' 
 , 
  
 news 
 . 
 body 
 )) 
 WHERE 
  
 category 
  
 = 
  
 'tech' 
 ; 
 

AI.SCORE

  AI 
 . 
 SCORE 
 ( 
 prompt 
 ) 
 

Description

Rates inputs based on a scoring system that you describe and returns a FLOAT64 value.

Use the AI.SCORE function with the ORDER BY clause to rank items. The following are common use cases:

  • Retail: Find the top 5 most negative customer reviews about a product.
  • Hiring: Find the top 10 resumes that appear most qualified for a job post.
  • Customer success: Find the top 20 best customer support interactions.

Definitions

  • prompt : A STRING value that specifies the input to score. For example, CONCAT('Rate this review: ', review_col, ' Use a scale from 1-10.') .

Details

AI.SCORE returns a FLOAT64 indicating the score assigned to the input. There is no fixed default range for the score. For best results, provide a scoring range in the prompt.

The prompt value is evaluated using Vertex AI Gemini LLM in the same project as the database. Enable the API before calling this function. pricing applies.

If the call to is unsuccessful for any reason, such as exceeding quota or model unavailability, then the function produces an error. In safe mode, the function returns NULL instead.

Return Type

FLOAT64

Examples

The following query uses the AI.SCORE function to assign ratings based on movie reviews of The English Patient , alongside the ratings that the human reviewers gave. It returns the top 10 highest AI rated reviews.

  -- Rate reviews 
 SELECT 
  
 AI 
 . 
 SCORE 
 ( 
 CONCAT 
 ( 
  
 " 
 "" 
 On a scale from 1 to 10, rate how much the reviewer liked the movie. 
 Review: 
  
 "" 
 " 
 , 
  
 review 
 )) 
  
 AS 
  
 ai_rating 
 , 
  
 reviewer_rating 
  
 AS 
  
 human_rating 
 , 
  
 review 
 FROM 
  
 reviews 
 WHERE 
  
 title 
  
 = 
  
 'The English Patient' 
 ORDER 
  
 BY 
  
 ai_rating 
  
 DESC 
 LIMIT 
  
 10 
 ; 
 

The following query builds on the previous example by using the AI.IF function to filter the results to show reviews that mention at least one of the film's main characters:

  -- Rate and filter reviews 
 SELECT 
  
 AI 
 . 
 SCORE 
 ( 
  
 CONCAT 
 ( 
  
 'On a scale from 1 to 10, rate how much the reviewer liked the movie. Review:' 
 , 
  
 review 
 )) 
  
 AS 
  
 ai_rating 
 , 
  
 reviewer_rating 
  
 AS 
  
 human_rating 
 , 
  
 review 
 FROM 
  
 reviews 
 WHERE 
  
 title 
  
 = 
  
 'The English Patient' 
  
 AND 
  
 AI 
 . 
 IF 
 ( 
  
 CONCAT 
 ( 
 "This review mentions at least one of the film's main characters: " 
 , 
  
 review 
 )) 
 ORDER 
  
 BY 
  
 ai_rating 
  
 DESC 
 LIMIT 
  
 10 
 ; 
 

ML.PREDICT

  ML 
 . 
 PREDICT 
 ( 
 input_model 
 , 
  
 input_relation 
 [ 
 , 
  
 model_parameters 
 ] 
 ) 
 input_model 
 : 
  
 MODEL 
  
 model_name 
 input_relation 
 : 
  
 { 
  
 input_table 
  
 | 
  
 input_subquery 
  
 } 
 input_table 
 : 
  
 TABLE 
  
 table_name 
 model_parameters 
 : 
  
 STRUCT 
 ( 
 parameter_value 
  
 AS 
  
 parameter_name 
 [ 
 , 
  
 ... 
 ] 
 ) 
 

Description

ML.PREDICT is a table-valued function that helps to access registered machine learning (ML) models and use them to generate ML predictions. This function applies ML computations defined by a model to each row of an input relation, and returns the results of those predictions. Additionally, you can use ML.PREDICT to perform vector search. When you use ML.PREDICT for vector search, it converts your natural language query text into an embedding.

Supported Argument Types

  • input_model : The model to use for predictions. Replace model_name with the name of the model. To create a model, see CREATE_MODEL .
  • input_relation : A table or subquery upon which to apply ML computations. The set of columns of the input relation must include all input columns of the input model; otherwise, the input won't have enough data to generate predictions and the query won't compile. Additionally, the set can also include arbitrary pass-through columns that will be included in the output. The order of the columns in the input relation doesn't matter. The columns of the input relation and model must be coercible.
  • input_table : The table containing the input data for predictions, for example, a set of features. Replace table_name with the name of the table.
  • input_subquery : The subquery that's used to generate the prediction input data.
  • model_parameters : A STRUCT value that contains parameters supported by model_name . These parameters are passed to the model inference.

Return Type

A table with the following columns:

  • Model outputs
  • Pass-through columns from the input relation

Examples

The examples in this section reference a model called DiamondAppraise and an input table called Diamonds with the following columns:

  • DiamondAppraise model:

    Input columns Output columns
    value FLOAT64 value FLOAT64
    carat FLOAT64 lower_bound FLOAT64
    cut STRING upper_bound FLOAT64
    color STRING(1)
  • Diamonds table:

    Columns
    Id INT64
    Carat FLOAT64
    Cut STRING
    Color STRING

The following query predicts the value of a diamond based on the diamond's carat, cut, and color.

  SELECT 
  
 id 
 , 
  
 color 
 , 
  
 value 
 FROM 
  
 ML 
 . 
 PREDICT 
 ( 
 MODEL 
  
 DiamondAppraise 
 , 
  
 TABLE 
  
 Diamonds 
 ); 
 +----+-------+-------+ 
 | 
  
 id 
  
 | 
  
 color 
  
 | 
  
 value 
  
 | 
 +----+-------+-------+ 
 | 
  
 1 
  
 | 
  
 I 
  
 | 
  
 280 
  
 | 
 | 
  
 2 
  
 | 
  
 G 
  
 | 
  
 447 
  
 | 
 +----+-------+-------+ 
 

You can include model-specific parameters. For example, in the following query, the maxOutputTokens parameter specifies that output , the model inference, can contain 10 or fewer tokens. This query succeeds because the model TextBison contains a parameter called maxOutputTokens .

  SELECT 
  
 prompt 
 , 
  
 output 
 FROM 
  
 ML 
 . 
 PREDICT 
 ( 
  
 MODEL 
  
 TextBison 
 , 
  
 ( 
 SELECT 
  
 "Is 13 prime?" 
  
 as 
  
 prompt 
 ), 
  
 STRUCT 
 ( 
 10 
  
 AS 
  
 maxOutputTokens 
 )); 
 +----------------+---------------------+ 
 | 
  
 prompt 
  
 | 
  
 output 
  
 | 
 +----------------+---------------------+ 
 | 
  
 "Is 13 prime?" 
  
 | 
  
 "Yes, 13 is prime." 
  
 | 
 +----------------+---------------------+ 
 

The following example generates an embedding for a natural language query. The example then uses that embedding to find the most similar entries in a database that are indexed by vector embeddings.

  -- Generate the embedding from a natural language prompt 
 WITH 
  
 embedding 
  
 AS 
  
 ( 
  
 SELECT 
  
 embeddings 
 . 
 values 
  
 FROM 
  
 ML 
 . 
 PREDICT 
 ( 
  
 MODEL 
  
 DiamondAppraise 
 , 
  
 ( 
 SELECT 
  
 "What is the most valuable diamond?" 
  
 as 
  
 prompt 
 ) 
  
 ) 
 ) 
 -- Use embedding to find the most similar entries in the database 
 SELECT 
  
 id 
 , 
  
 color 
 , 
  
 value 
 , 
  
 ( 
 APPROX_COSINE_DISTANCE 
 ( 
 valueEmbedding 
 , 
  
 embedding 
 . 
 values 
 , 
  
 options 
  
 = 
>  
 JSON 
  
 '{"num_leaves_to_search": 10}' 
 )) 
  
 as 
  
 distance 
 FROM 
  
 products 
  
 @{ 
 force_index 
 = 
 valueEmbeddingIndex 
 } 
 , 
  
 embedding 
 WHERE 
  
 valueEmbedding 
  
 IS 
  
 NOT 
  
 NULL 
 ORDER 
  
 BY 
  
 distance 
 LIMIT 
  
 5 
 ; 
 

You can use ML.PREDICT in any DQL/DML statements, such as INSERT or UPDATE . For example:

  INSERT 
  
 INTO 
  
 AppraisedDiamond 
  
 ( 
 id 
 , 
  
 color 
 , 
  
 carat 
 , 
  
 value 
 ) 
 SELECT 
  
 1 
  
 AS 
  
 id 
 , 
  
 color 
 , 
  
 carat 
 , 
  
 value 
 FROM 
  
 ML 
 . 
 PREDICT 
 ( 
 MODEL 
  
 DiamondAppraise 
 , 
  
 ( 
  
 SELECT 
  
 @carat 
  
 AS 
  
 carat 
 , 
  
 @cut 
  
 AS 
  
 cut 
 , 
  
 @color 
  
 AS 
  
 color 
  
 )); 
 
Design a Mobile Site
View Site in Mobile | Classic
Share by: