- 2.17.0 (latest)
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 2.0.0-dev0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
AIAccessor
(
df
)
API documentation for AIAccessor
class.
Methods
classify
classify
(
instruction
:
str
,
model
,
labels
:
typing
.
Sequence
[
str
],
output_column
:
str
=
"result"
,
ground_with_google_search
:
bool
=
False
,
attach_logprobs
=
False
,
)
Classifies the rows of dataframes based on user instruction into the provided labels.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> bpd.options.experiments.ai_operators = True
>>> bpd.options.compute.ai_ops_confirmation_threshold = 25
>>> import bigframes.ml.llm as llm
>>> model = llm.GeminiTextGenerator(model_name="gemini-2.0-flash-001")
>>> df = bpd.DataFrame({
... "feedback_text": [
... "The product is amazing, but the shipping was slow.",
... "I had an issue with my recent bill.",
... "The user interface is very intuitive."
... ],
... })
>>> df.ai.classify("{feedback_text}", model=model, labels=["Shipping", "Billing", "UI"])
feedback_text result
0 The product is amazing, but the shipping was s... Shipping
1 I had an issue with my recent bill. Billing
2 The user interface is very intuitive. UI
<BLANKLINE>
[3 rows x 2 columns]
instruction
str
An instruction on how to classify the data. This value must contain column references by name, which should be wrapped in a pair of braces. For example, if you have a column "feedback", you can refer to this column with"{food}".
model
labels
Sequence[str]
A collection of labels (categories). It must contain at least two and at most 20 elements. Labels are case sensitive. Duplicated labels are not allowed.
output_column
str, default "result"
The name of column for the output.
ground_with_google_search
bool, default False
Enables Grounding with Google Search for the GeminiTextGenerator model. When set to True, the model incorporates relevant information from Google Search results into its responses, enhancing their accuracy and factualness. Note: Using this feature may impact billing costs. Refer to the pricing page for details: https://cloud.google.com/vertex-ai/generative-ai/pricing#google_models
The default is False
.
attach_logprobs
bool, default False
Controls whether to attach an additional "logprob" column for each result. Logprobs are float-point values reflecting the confidence level of the LLM for their responses. Higher values indicate more confidence. The value is in the range between negative infinite and 0.
NotImplementedError
ValueError
filter
filter
(
instruction
:
str
,
model
,
ground_with_google_search
:
bool
=
False
,
attach_logprobs
:
bool
=
False
,
)
Filters the DataFrame with the semantics of the user instruction.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> bpd.options.experiments.ai_operators = True
>>> bpd.options.compute.ai_ops_confirmation_threshold = 25
>>> import bigframes.ml.llm as llm
>>> model = llm.GeminiTextGenerator(model_name="gemini-2.0-flash-001")
>>> df = bpd.DataFrame({"country": ["USA", "Germany"], "city": ["Seattle", "Berlin"]})
>>> df.ai.filter("{city} is the capital of {country}", model)
country city
1 Germany Berlin
<BLANKLINE>
[1 rows x 2 columns]
instruction
str
An instruction on how to filter the data. This value must contain column references by name, which should be wrapped in a pair of braces. For example, if you have a column "food", you can refer to this column in the instructions like: "The {food} is healthy."
model
ground_with_google_search
bool, default False
Enables Grounding with Google Search for the GeminiTextGenerator model. When set to True, the model incorporates relevant information from Google Search results into its responses, enhancing their accuracy and factualness. Note: Using this feature may impact billing costs. Refer to the pricing page for details: https://cloud.google.com/vertex-ai/generative-ai/pricing#google_models
The default is False
.
attach_logprobs
bool, default False
Controls whether to attach an additional "logprob" column for each result. Logprobs are float-point values reflecting the confidence level of the LLM for their responses. Higher values indicate more confidence. The value is in the range between negative infinite and 0.
NotImplementedError
ValueError
join
join
(
other
,
instruction
:
str
,
model
,
ground_with_google_search
:
bool
=
False
,
attach_logprobs
=
False
,
)
Joines two dataframes by applying the instruction over each pair of rows from the left and right table.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> bpd.options.experiments.ai_operators = True
>>> bpd.options.compute.ai_ops_confirmation_threshold = 25
>>> import bigframes.ml.llm as llm
>>> model = llm.GeminiTextGenerator(model_name="gemini-2.0-flash-001")
>>> cities = bpd.DataFrame({'city': ['Seattle', 'Ottawa', 'Berlin', 'Shanghai', 'New Delhi']})
>>> continents = bpd.DataFrame({'continent': ['North America', 'Africa', 'Asia']})
>>> cities.ai.join(continents, "{city} is in {continent}", model)
city continent
0 Seattle North America
1 Ottawa North America
2 Shanghai Asia
3 New Delhi Asia
<BLANKLINE>
[4 rows x 2 columns]
other
instruction
str
An instruction on how left and right rows can be joined. This value must contain column references by name. which should be wrapped in a pair of braces. For example: "The {city} belongs to the {country}". For column names that are shared between two dataframes, you need to add "left." and "right." prefix for differentiation. This is especially important when you do self joins. For example: "The {left.employee_name} reports to {right.employee_name}" For unique column names, this prefix is optional.
model
ground_with_google_search
bool, default False
Enables Grounding with Google Search for the GeminiTextGenerator model. When set to True, the model incorporates relevant information from Google Search results into its responses, enhancing their accuracy and factualness. Note: Using this feature may impact billing costs. Refer to the pricing page for details: https://cloud.google.com/vertex-ai/generative-ai/pricing#google_models
The default is False
.
attach_logprobs
bool, default False
Controls whether to attach an additional "logprob" column for each result. Logprobs are float-point values reflecting the confidence level of the LLM for their responses. Higher values indicate more confidence. The value is in the range between negative infinite and 0.
ValueErro
map
map
(
instruction
:
str
,
model
,
output_schema
:
typing
.
Optional
[
typing
.
Dict
[
str
,
str
]]
=
None
,
ground_with_google_search
:
bool
=
False
,
attach_logprobs
=
False
,
)
Maps the DataFrame with the semantics of the user instruction.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> bpd.options.experiments.ai_operators = True
>>> bpd.options.compute.ai_ops_confirmation_threshold = 25
>>> import bigframes.ml.llm as llm
>>> model = llm.GeminiTextGenerator(model_name="gemini-2.0-flash-001")
>>> df = bpd.DataFrame({"ingredient_1": ["Burger Bun", "Soy Bean"], "ingredient_2": ["Beef Patty", "Bittern"]})
>>> df.ai.map("What is the food made from {ingredient_1} and {ingredient_2}? One word only.", model=model, output_schema={"food": "string"})
ingredient_1 ingredient_2 food
0 Burger Bun Beef Patty Burger
<BLANKLINE>
1 Soy Bean Bittern Tofu
<BLANKLINE>
<BLANKLINE>
[2 rows x 3 columns]
instruction
str
An instruction on how to map the data. This value must contain column references by name, which should be wrapped in a pair of braces. For example, if you have a column "food", you can refer to this column in the instructions like: "Get the ingredients of {food}."
model
output_schema
Dict[str, str] or None, default None
The schema used to generate structured output as a bigframes DataFrame. The schema is a string key-value pair of <column_name>:
ground_with_google_search
bool, default False
Enables Grounding with Google Search for the GeminiTextGenerator model. When set to True, the model incorporates relevant information from Google Search results into its responses, enhancing their accuracy and factualness. Note: Using this feature may impact billing costs. Refer to the pricing page for details: https://cloud.google.com/vertex-ai/generative-ai/pricing#google_models
The default is False
.
attach_logprobs
bool, default False
Controls whether to attach an additional "logprob" column for each result. Logprobs are float-point values reflecting the confidence level of the LLM for their responses. Higher values indicate more confidence. The value is in the range between negative infinite and 0.
NotImplementedError
ValueError
search
search
(
search_column
:
str
,
query
:
str
,
top_k
:
int
,
model
,
score_column
:
typing
.
Optional
[
str
]
=
None
,
)
Performs AI semantic search on the DataFrame.
** Examples: **
>>> import bigframes.pandas as bpd
>>> bpd.options. display
.progress_bar = None
>>> import bigframes
>>> bigframes
.options.experiments.ai_operators = True
>>> bpd.options.compute.ai_ops_confirmation_threshold = 25
>>> import bigframes.ml.llm as llm
>>> model = llm
. TextEmbeddingGenerator
(model_name="text-embedding-005")
>>> df = bpd.DataFrame({"creatures": ["salmon", "sea urchin", "frog", "chimpanzee"]})
>>> df.ai. search
("creatures", "monkey", top_k=1, model=model, score_column='distance')
creatures distance
3 chimpanzee 0.635844
<BLANKLINE>
[1 rows x 2 columns]
query
str
The search query.
top_k
int
The number of nearest neighbors to return.
model
TextEmbeddingGenerator
A TextEmbeddingGenerator provided by Bigframes ML package.
score_column
Optional[str], default None
The name of the the additional column containning the similarity scores. If None, this column won't be attached to the result.
ValueError
TypeError
DataFrame
sim_join
sim_join
(
other
,
left_on
:
str
,
right_on
:
str
,
model
,
top_k
:
int
=
3
,
score_column
:
typing
.
Optional
[
str
]
=
None
,
max_rows
:
int
=
1000
,
)
Joins two dataframes based on the similarity of the specified columns.
This method uses BigQuery's VECTOR_SEARCH function to match rows on the left side with the rows that have nearest embedding vectors on the right. In the worst case scenario, the complexity is around O(M * N * log K). Therefore, this is a potentially expensive operation.
** Examples: **
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> bpd.options.experiments.ai_operators = True
>>> bpd.options.compute.ai_ops_confirmation_threshold = 25
>>> import bigframes.ml.llm as llm
>>> model = llm.TextEmbeddingGenerator(model_name="text-embedding-005")
>>> df1 = bpd.DataFrame({'animal': ['monkey', 'spider']})
>>> df2 = bpd.DataFrame({'animal': ['scorpion', 'baboon']})
>>> df1.ai.sim_join(df2, left_on='animal', right_on='animal', model=model, top_k=1)
animal animal_1
0 monkey baboon
1 spider scorpion
<BLANKLINE>
[2 rows x 2 columns]
other
DataFrame
The other data frame to join with.
left_on
str
The name of the column on left side for the join.
right_on
str
The name of the column on the right side for the join.
top_k
int, default 3
The number of nearest neighbors to return.
model
TextEmbeddingGenerator
A TextEmbeddingGenerator provided by Bigframes ML package.
score_column
Optional[str], default None
The name of the the additional column containning the similarity scores. If None, this column won't be attached to the result.
ValueError
DataFrame
top_k
top_k
(
instruction
:
str
,
model
,
k
:
int
=
10
,
ground_with_google_search
:
bool
=
False
)
Ranks each tuple and returns the k best according to the instruction.
This method employs a quick select algorithm to efficiently compare the pivot with all other items. By leveraging an LLM (Large Language Model), it then identifies the top 'k' best answers from these comparisons.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> bpd.options.experiments.ai_operators = True
>>> bpd.options.compute.ai_ops_confirmation_threshold = 25
>>> import bigframes.ml.llm as llm
>>> model = llm.GeminiTextGenerator(model_name="gemini-2.0-flash-001")
>>> df = bpd.DataFrame(
... {
... "Animals": ["Dog", "Bird", "Cat", "Horse"],
... "Sounds": ["Woof", "Chirp", "Meow", "Neigh"],
... })
>>> df.ai.top_k("{Animals} are more popular as pets", model=model, k=2)
Animals Sounds
0 Dog Woof
2 Cat Meow
<BLANKLINE>
[2 rows x 2 columns]
instruction
str
An instruction on how to map the data. This value must contain column references by name enclosed in braces. For example, to reference a column named "Animals", use "{Animals}" in the instruction, like: "{Animals} are more popular as pets"
model
k
int, default 10
The number of rows to return.
ground_with_google_search
bool, default False
Enables Grounding with Google Search for the GeminiTextGenerator model. When set to True, the model incorporates relevant information from Google Search results into its responses, enhancing their accuracy and factualness. Note: Using this feature may impact billing costs. Refer to the pricing page for details: https://cloud.google.com/vertex-ai/generative-ai/pricing#google_models
The default is False
.
NotImplementedError
ValueError