Class AIAccessor (2.15.0)

  AIAccessor 
 ( 
 df 
 , 
 base_bqml 
 = 
 None 
 )

API documentation for AIAccessor class.

Methods

classify

  classify 
 ( 
 instruction 
 : 
 str 
 , 
 model 
 , 
 labels 
 : 
 typing 
 . 
 Sequence 
 [ 
 str 
 ], 
 output_column 
 : 
 str 
 = 
 "result" 
 , 
 ground_with_google_search 
 : 
 bool 
 = 
 False 
 , 
 )

Classifies the rows of dataframes based on user instruction into the provided labels.

Examples:

 >>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> bpd.options.experiments.ai_operators = True
>>> bpd.options.compute.ai_ops_confirmation_threshold = 25

>>> import bigframes.ml.llm as llm
>>> model = llm.GeminiTextGenerator(model_name="gemini-2.0-flash-001")

>>> df = bpd.DataFrame({
...     "feedback_text": [
...         "The product is amazing, but the shipping was slow.",
...         "I had an issue with my recent bill.",
...         "The user interface is very intuitive."
...     ],
... })
>>> df.ai.classify("{feedback_text}", model=model, labels=["Shipping", "Billing", "UI"])
                                       feedback_text     result
0  The product is amazing, but the shipping was s...   Shipping
1                I had an issue with my recent bill.    Billing
2              The user interface is very intuitive.         UI
<BLANKLINE>
[3 rows x 2 columns]

Parameters

Name

Description

instruction

str

An instruction on how to classify the data. This value must contain column references by name, which should be wrapped in a pair of braces. For example, if you have a column "feedback", you can refer to this column with"{food}".

model

 bigframes.ml.llm.GeminiTextGenerator

A GeminiTextGenerator provided by Bigframes ML package.

labels

Sequence[str]

A collection of labels (categories). It must contain at least two and at most 20 elements. Labels are case sensitive. Duplicated labels are not allowed.

output_column

str, default "result"

The name of column for the output.

ground_with_google_search

bool, default False

Enables Grounding with Google Search for the GeminiTextGenerator model. When set to True, the model incorporates relevant information from Google Search results into its responses, enhancing their accuracy and factualness. Note: Using this feature may impact billing costs. Refer to the pricing page for details: https://cloud.google.com/vertex-ai/generative-ai/pricing#google_models The default is False .

Exceptions

Type

Description

NotImplementedError

when the AI operator experiment is off.

ValueError

when the instruction refers to a non-existing column, when no columns are referred to, or when the count of labels does not meet the requirement.

Returns

Type

Description

 bigframes.pandas.DataFrame

DataFrame with classification result.

filter

  filter 
 ( 
 instruction 
 : 
 str 
 , 
 model 
 , 
 ground_with_google_search 
 : 
 bool 
 = 
 False 
 )

Filters the DataFrame with the semantics of the user instruction.

Examples:

 >>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> bpd.options.experiments.ai_operators = True
>>> bpd.options.compute.ai_ops_confirmation_threshold = 25

>>> import bigframes.ml.llm as llm
>>> model = llm.GeminiTextGenerator(model_name="gemini-2.0-flash-001")

>>> df = bpd.DataFrame({"country": ["USA", "Germany"], "city": ["Seattle", "Berlin"]})
>>> df.ai.filter("{city} is the capital of {country}", model)
   country    city
1  Germany  Berlin
<BLANKLINE>
[1 rows x 2 columns]

Parameters

Name

Description

instruction

str

An instruction on how to filter the data. This value must contain column references by name, which should be wrapped in a pair of braces. For example, if you have a column "food", you can refer to this column in the instructions like: "The {food} is healthy."

model

 bigframes.ml.llm.GeminiTextGenerator

A GeminiTextGenerator provided by Bigframes ML package.

ground_with_google_search

bool, default False

Exceptions

Type

Description

NotImplementedError

when the AI operator experiment is off.

ValueError

when the instruction refers to a non-existing column, or when no columns are referred to.

Returns

Type

Description

 bigframes.pandas.DataFrame

DataFrame filtered by the instruction.

forecast

  forecast 
 ( 
 timestamp_column 
 : 
 str 
 , 
 data_column 
 : 
 str 
 , 
 * 
 , 
 model 
 : 
 str 
 = 
 "TimesFM 2.0" 
 , 
 id_columns 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Iterable 
 [ 
 str 
 ]] 
 = 
 None 
 , 
 horizon 
 : 
 int 
 = 
 10 
 , 
 confidence_level 
 : 
 float 
 = 
 0.95 
 )

Forecast time series at future horizon. Using Google Research's open source TimesFM( https://github.com/google-research/timesfm ) model.

Parameters

Name

Description

timestamp_column

str

A str value that specified the name of the time points column. The time points column provides the time points used to generate the forecast. The time points column must use one of the following data types: TIMESTAMP, DATE and DATETIME

data_column

str

A str value that specifies the name of the data column. The data column contains the data to forecast. The data column must use one of the following data types: INT64, NUMERIC and FLOAT64

model

str, default "TimesFM 2.0"

A str value that specifies the name of the model. TimesFM 2.0 is the only supported value, and is the default value.

id_columns

Iterable[str] or None, default None

An iterable of str value that specifies the names of one or more ID columns. Each ID identifies a unique time series to forecast. Specify one or more values for this argument in order to forecast multiple time series using a single query. The columns that you specify must use one of the following data types: STRING, INT64, ARRAY

horizon

int, default 10

An int value that specifies the number of time points to forecast. The default value is 10. The valid input range is [1, 10,000].

confidence_level

float, default 0.95

A FLOAT64 value that specifies the percentage of the future values that fall in the prediction interval. The default value is 0.95. The valid input range is [0, 1).

Exceptions

Type

Description

ValueError

when referring to a non-existing column.

Returns

Type

Description

DataFrame

The forecast dataframe matches that of the BigQuery AI.FORECAST function. See: https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ai-forecast

join

  join 
 ( 
 other 
 , 
 instruction 
 : 
 str 
 , 
 model 
 , 
 ground_with_google_search 
 : 
 bool 
 = 
 False 
 )

Joines two dataframes by applying the instruction over each pair of rows from the left and right table.

Examples:

 >>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> bpd.options.experiments.ai_operators = True
>>> bpd.options.compute.ai_ops_confirmation_threshold = 25

>>> import bigframes.ml.llm as llm
>>> model = llm.GeminiTextGenerator(model_name="gemini-2.0-flash-001")

>>> cities = bpd.DataFrame({'city': ['Seattle', 'Ottawa', 'Berlin', 'Shanghai', 'New Delhi']})
>>> continents = bpd.DataFrame({'continent': ['North America', 'Africa', 'Asia']})

>>> cities.ai.join(continents, "{city} is in {continent}", model)
        city      continent
0    Seattle  North America
1     Ottawa  North America
2   Shanghai           Asia
3  New Delhi           Asia
<BLANKLINE>
[4 rows x 2 columns]

Parameters

Name

Description

other

 bigframes.pandas.DataFrame

The other dataframe.

instruction

str

An instruction on how left and right rows can be joined. This value must contain column references by name. which should be wrapped in a pair of braces. For example: "The {city} belongs to the {country}". For column names that are shared between two dataframes, you need to add "left." and "right." prefix for differentiation. This is especially important when you do self joins. For example: "The {left.employee_name} reports to {right.employee_name}" For unique column names, this prefix is optional.

model

 bigframes.ml.llm.GeminiTextGenerator

A GeminiTextGenerator provided by Bigframes ML package.

ground_with_google_search

bool, default False

Exceptions

Type

Description

ValueErro

if the amount of data that will be sent for LLM processing is larger than max_rows.:

Returns

Type

Description

 bigframes.pandas.DataFrame

The joined dataframe.

map

  map 
 ( 
 instruction 
 : 
 str 
 , 
 model 
 , 
 output_schema 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Dict 
 [ 
 str 
 , 
 str 
 ]] 
 = 
 None 
 , 
 ground_with_google_search 
 : 
 bool 
 = 
 False 
 , 
 )

Maps the DataFrame with the semantics of the user instruction. The name of the keys in the output_schema parameter carry semantic meaning, and can be used for information extraction.

Examples:

 >>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> bpd.options.experiments.ai_operators = True
>>> bpd.options.compute.ai_ops_confirmation_threshold = 25

>>> import bigframes.ml.llm as llm
>>> model = llm.GeminiTextGenerator(model_name="gemini-2.0-flash-001")

>>> df = bpd.DataFrame({"ingredient_1": ["Burger Bun", "Soy Bean"], "ingredient_2": ["Beef Patty", "Bittern"]})
>>> df.ai.map("What is the food made from {ingredient_1} and {ingredient_2}? One word only.", model=model, output_schema={"food": "string"})
  ingredient_1 ingredient_2      food
0   Burger Bun   Beef Patty  Burger
<BLANKLINE>
1     Soy Bean      Bittern    Tofu
<BLANKLINE>
<BLANKLINE>
[2 rows x 3 columns]


>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> bpd.options.experiments.ai_operators = True
>>> bpd.options.compute.ai_ops_confirmation_threshold = 25

>>> import bigframes.ml.llm as llm
>>> model = llm.GeminiTextGenerator(model_name="gemini-2.0-flash-001")

>>> df = bpd.DataFrame({"text": ["Elmo lives at 123 Sesame Street."]})
>>> df.ai.map("{text}", model=model, output_schema={"person": "string", "address": "string"})
                               text person            address
0  Elmo lives at 123 Sesame Street.   Elmo  123 Sesame Street
<BLANKLINE>
[1 rows x 3 columns]

Parameters

Name

Description

instruction

str

An instruction on how to map the data. This value must contain column references by name, which should be wrapped in a pair of braces. For example, if you have a column "food", you can refer to this column in the instructions like: "Get the ingredients of {food}."

model

 bigframes.ml.llm.GeminiTextGenerator

A GeminiTextGenerator provided by Bigframes ML package.

output_schema

Dict[str, str] or None, default None

The schema used to generate structured output as a bigframes DataFrame. The schema is a string key-value pair of <column_name>:

ground_with_google_search

bool, default False

Exceptions

Type

Description

NotImplementedError

when the AI operator experiment is off.

ValueError

when the instruction refers to a non-existing column, or when no columns are referred to.

Returns

Type

Description

 bigframes.pandas.DataFrame

DataFrame with attached mapping results.

search

  search 
 ( 
 search_column 
 : 
 str 
 , 
 query 
 : 
 str 
 , 
 top_k 
 : 
 int 
 , 
 model 
 , 
 score_column 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 , 
 )

Performs AI semantic search on the DataFrame.

** Examples: **

 >>> import bigframes.pandas as bpd
>>> bpd.options. display 
.progress_bar = None

>>> import bigframes 
>>> bigframes 
.options.experiments.ai_operators = True
>>> bpd.options.compute.ai_ops_confirmation_threshold = 25

>>> import bigframes.ml.llm as llm
>>> model = llm 
. TextEmbeddingGenerator 
(model_name="text-embedding-005")

>>> df = bpd.DataFrame({"creatures": ["salmon", "sea urchin", "frog", "chimpanzee"]})
>>> df.ai. search 
("creatures", "monkey", top_k=1, model=model, score_column='distance')
    creatures  distance
3  chimpanzee  0.635844
<BLANKLINE>
[1 rows x 2 columns]

Parameters

Name

Description

query

str

The search query.

top_k

int

The number of nearest neighbors to return.

model

TextEmbeddingGenerator

A TextEmbeddingGenerator provided by Bigframes ML package.

score_column

Optional[str], default None

The name of the the additional column containning the similarity scores. If None, this column won't be attached to the result.

Exceptions

Type

Description

ValueError

when the search_column is not found from the the data frame.

TypeError

when the provided model is not TextEmbeddingGenerator.

Returns

Type

Description

DataFrame

the DataFrame with the search result.

sim_join

  sim_join 
 ( 
 other 
 , 
 left_on 
 : 
 str 
 , 
 right_on 
 : 
 str 
 , 
 model 
 , 
 top_k 
 : 
 int 
 = 
 3 
 , 
 score_column 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 , 
 max_rows 
 : 
 int 
 = 
 1000 
 , 
 )

Joins two dataframes based on the similarity of the specified columns.

This method uses BigQuery's VECTOR_SEARCH function to match rows on the left side with the rows that have nearest embedding vectors on the right. In the worst case scenario, the complexity is around O(M * N * log K). Therefore, this is a potentially expensive operation.

** Examples: **

 >>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> bpd.options.experiments.ai_operators = True
>>> bpd.options.compute.ai_ops_confirmation_threshold = 25

>>> import bigframes.ml.llm as llm
>>> model = llm.TextEmbeddingGenerator(model_name="text-embedding-005")

>>> df1 = bpd.DataFrame({'animal': ['monkey', 'spider']})
>>> df2 = bpd.DataFrame({'animal': ['scorpion', 'baboon']})

>>> df1.ai.sim_join(df2, left_on='animal', right_on='animal', model=model, top_k=1)
animal  animal_1
0  monkey    baboon
1  spider  scorpion
<BLANKLINE>
[2 rows x 2 columns]

Parameters

Name

Description

other

DataFrame

The other data frame to join with.

left_on

str

The name of the column on left side for the join.

right_on

str

The name of the column on the right side for the join.

top_k

int, default 3

The number of nearest neighbors to return.

model

TextEmbeddingGenerator

A TextEmbeddingGenerator provided by Bigframes ML package.

score_column

Optional[str], default None

The name of the the additional column containning the similarity scores. If None, this column won't be attached to the result.

Exceptions

Type

Description

ValueError

when the amount of data to be processed exceeds the specified max_rows.

Returns

Type

Description

DataFrame

the data frame with the join result.

Class AIAccessor (2.15.0) Stay organized with collections Save and categorize content based on your preferences.

Methods

classify

filter

forecast

join

map

search

sim_join

Class AIAccessor (2.15.0)