Package bigquery (2.28.0)

API documentation for bigquery package.

Packages Functions

approx_top_count

  approx_top_count 
 ( 
 series 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 number 
 : 
 int 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Returns the approximate top elements of expression as an array of STRUCTs. The number parameter specifies the number of elements returned.

Each STRUCT contains two fields. The first field (named value ) contains an input value. The second field (named count ) contains an INT64 specifying the number of times the value was returned.

Returns NULL if there are zero input rows.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> s = bpd.Series(["apple", "apple", "pear", "pear", "pear", "banana"])
>>> bbq.approx_top_count(s, number=2)
[{'value': 'pear', 'count': 3}, {'value': 'apple', 'count': 2}]

Parameters

Name

Description

series

 bigframes.series.Series

The Series with any data type that the GROUP BY clause supports.

number

int

An integer specifying the number of times the value was returned.

array_agg

  array_agg 
 ( 
 obj 
 : 
 groupby 
 . 
 SeriesGroupBy 
 | 
 groupby 
 . 
 DataFrameGroupBy 
 , 
 ) 
 - 
> series 
 . 
 Series 
 | 
 dataframe 
 . 
 DataFrame

Group data and create arrays from selected columns, omitting NULLs to avoid BigQuery errors (NULLs not allowed in arrays).

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

For a SeriesGroupBy object:

 >>> lst = ['a', 'a', 'b', 'b', 'a']
>>> s = bpd.Series([1, 2, 3, 4, np.nan], index=lst)
>>> bbq.array_agg(s.groupby(level=0))
a    [1. 2.]
b    [3. 4.]
dtype: list<item: double>[pyarrow]

For a DataFrameGroupBy object:

 >>> l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
>>> df = bpd.DataFrame(l, columns=["a", "b", "c"])
>>> bbq.array_agg(df.groupby(by=["b"]))
         a      c
b
1.0    [2]    [3]
2.0  [1 1]  [3 2]
<BLANKLINE>
[2 rows x 2 columns]

Parameter

Name

Description

obj

groupby.SeriesGroupBy groupby.DataFrameGroupBy

A GroupBy object to be applied the function.

array_length

  array_length 
 ( 
 series 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Compute the length of each array element in the Series.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.Series([[1, 2, 8, 3], [], [3, 4]])
>>> bbq.array_length(s)
0    4
1    0
2    2
dtype: Int64

You can also apply this function directly to Series.

 >>> s.apply(bbq.array_length, by_row=False)
0    4
1    0
2    2
dtype: Int64

Parameter

Name

Description

series

 bigframes.series.Series

A Series with array columns.

array_to_string

  array_to_string 
 ( 
 series 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 delimiter 
 : 
 str 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Converts array elements within a Series into delimited strings.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.Series([["H", "i", "!"], ["Hello", "World"], np.nan, [], ["Hi"]])
>>> bbq.array_to_string(s, delimiter=", ")
    0         H, i, !
    1    Hello, World
    2
    3
    4              Hi
    dtype: string

Parameters

Name

Description

series

 bigframes.series.Series

A Series containing arrays.

delimiter

str

The string used to separate array elements.

create_vector_index

  create_vector_index 
 ( 
 table_id 
 : 
 str 
 , 
 column_name 
 : 
 str 
 , 
 * 
 , 
 replace 
 : 
 bool 
 = 
 False 
 , 
 index_name 
 : 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 , 
 distance_type 
 = 
 "cosine" 
 , 
 stored_column_names 
 : 
 Collection 
 [ 
 str 
 ] 
 = 
 (), 
 index_type 
 : 
 str 
 = 
 "ivf" 
 , 
 ivf_options 
 : 
 Optional 
 [ 
 Mapping 
 ] 
 = 
 None 
 , 
 tree_ah_options 
 : 
 Optional 
 [ 
 Mapping 
 ] 
 = 
 None 
 , 
 session 
 : 
 Optional 
 [ 
 bigframes 
 . 
 session 
 . 
 Session 
 ] 
 = 
 None 
 ) 
 - 
> None

Creates a new vector index on a column of a table.

This method calls the CREATE VECTOR INDEX DDL statement <https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_vector_index_statement> _.

json_extract

  json_extract 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 json_path 
 : 
 str 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Extracts a JSON value and converts it to a SQL JSON-formatted STRING or JSON value. This function uses single quotes and brackets to escape invalid JSONPath characters in JSON keys.

deprecated: The json_extract is deprecated and will be removed in a future version. Use json_query instead.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.Series(['{"class": {"students": [{"id": 5}, {"id": 12}]}}'])
>>> bbq.json_extract(s, json_path="$.class")
0    {"students":[{"id":5},{"id":12}]}
dtype: string

Parameters

Name

Description

input

 bigframes.series.Series

The Series containing JSON data (as native JSON objects or JSON-formatted strings).

json_path

str

The JSON path identifying the data that you want to obtain from the input.

json_extract_array

  json_extract_array 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 json_path 
 : 
 str 
 = 
 "$" 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Extracts a JSON array and converts it to a SQL array of JSON-formatted STRING or JSON values. This function uses single quotes and brackets to escape invalid JSONPath characters in JSON keys.

deprecated: The json_extract_array is deprecated and will be removed in a future version. Use json_query_array instead.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.Series(['[1, 2, 3]', '[4, 5]'])
>>> bbq.json_extract_array(s)
0    ['1' '2' '3']
1        ['4' '5']
dtype: list<item: string>[pyarrow]

>>> s = bpd.Series([
...   '{"fruits": [{"name": "apple"}, {"name": "cherry"}]}',
...   '{"fruits": [{"name": "guava"}, {"name": "grapes"}]}'
... ])
>>> bbq.json_extract_array(s, "$.fruits")
0    ['{"name":"apple"}' '{"name":"cherry"}']
1    ['{"name":"guava"}' '{"name":"grapes"}']
dtype: list<item: string>[pyarrow]

>>> s = bpd.Series([
...   '{"fruits": {"color": "red",   "names": ["apple","cherry"]}}',
...   '{"fruits": {"color": "green", "names": ["guava", "grapes"]}}'
... ])
>>> bbq.json_extract_array(s, "$.fruits.names")
0    ['"apple"' '"cherry"']
1    ['"guava"' '"grapes"']
dtype: list<item: string>[pyarrow]

Parameters

Name

Description

input

 bigframes.series.Series

The Series containing JSON data (as native JSON objects or JSON-formatted strings).

json_path

str

The JSON path identifying the data that you want to obtain from the input.

json_extract_string_array

  json_extract_string_array 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 json_path 
 : 
 str 
 = 
 "$" 
 , 
 value_dtype 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Union 
 [ 
 pandas 
 . 
 core 
 . 
 arrays 
 . 
 boolean 
 . 
 BooleanDtype 
 , 
 pandas 
 . 
 core 
 . 
 arrays 
 . 
 floating 
 . 
 Float64Dtype 
 , 
 pandas 
 . 
 core 
 . 
 arrays 
 . 
 integer 
 . 
 Int64Dtype 
 , 
 pandas 
 . 
 core 
 . 
 arrays 
 . 
 string_ 
 . 
 StringDtype 
 , 
 pandas 
 . 
 core 
 . 
 dtypes 
 . 
 dtypes 
 . 
 ArrowDtype 
 , 
 geopandas 
 . 
 array 
 . 
 GeometryDtype 
 , 
 typing 
 . 
 Literal 
 [ 
 "boolean" 
 , 
 "Float64" 
 , 
 "Int64" 
 , 
 "int64[pyarrow]" 
 , 
 "string" 
 , 
 "string[pyarrow]" 
 , 
 "timestamp[us, tz=UTC][pyarrow]" 
 , 
 "timestamp[us][pyarrow]" 
 , 
 "date32[day][pyarrow]" 
 , 
 "time64[us][pyarrow]" 
 , 
 "decimal128(38, 9)[pyarrow]" 
 , 
 "decimal256(76, 38)[pyarrow]" 
 , 
 "binary[pyarrow]" 
 , 
 "duration[us][pyarrow]" 
 , 
 ], 
 ] 
 ] 
 = 
 None 
 , 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Extracts a JSON array and converts it to a SQL array of STRING values. A value_dtype can be provided to further coerce the data type of the values in the array. This function uses single quotes and brackets to escape invalid JSONPath characters in JSON keys.

deprecated: The json_extract_string_array is deprecated and will be removed in a future version. Use json_value_array instead.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.Series(['[1, 2, 3]', '[4, 5]'])
>>> bbq.json_extract_string_array(s)
0    ['1' '2' '3']
1        ['4' '5']
dtype: list<item: string>[pyarrow]

>>> bbq.json_extract_string_array(s, value_dtype='Int64')
0    [1 2 3]
1      [4 5]
dtype: list<item: int64>[pyarrow]

>>> s = bpd.Series([
...   '{"fruits": {"color": "red",   "names": ["apple","cherry"]}}',
...   '{"fruits": {"color": "green", "names": ["guava", "grapes"]}}'
... ])
>>> bbq.json_extract_string_array(s, "$.fruits.names")
0    ['apple' 'cherry']
1    ['guava' 'grapes']
dtype: list<item: string>[pyarrow]

Parameters

Name

Description

input

 bigframes.series.Series

The Series containing JSON data (as native JSON objects or JSON-formatted strings).

json_path

str

The JSON path identifying the data that you want to obtain from the input.

value_dtype

dtype, Optional

The data type supported by BigFrames DataFrame.

json_query

  json_query 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 json_path 
 : 
 str 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Extracts a JSON value and converts it to a SQL JSON-formatted STRING or JSON value. This function uses double quotes to escape invalid JSONPath characters in JSON keys. For example: "a.b" .

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.Series(['{"class": {"students": [{"id": 5}, {"id": 12}]}}'])
>>> bbq.json_query(s, json_path="$.class")
0    {"students":[{"id":5},{"id":12}]}
dtype: string

Parameters

Name

Description

input

 bigframes.series.Series

The Series containing JSON data (as native JSON objects or JSON-formatted strings).

json_path

str

The JSON path identifying the data that you want to obtain from the input.

json_query_array

  json_query_array 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 json_path 
 : 
 str 
 = 
 "$" 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Extracts a JSON array and converts it to a SQL array of JSON-formatted STRING or JSON values. This function uses double quotes to escape invalid JSONPath characters in JSON keys. For example: "a.b" .

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.Series(['[1, 2, 3]', '[4, 5]'])
>>> bbq.json_query_array(s)
0    ['1' '2' '3']
1        ['4' '5']
dtype: list<item: string>[pyarrow]

>>> s = bpd.Series([
...   '{"fruits": [{"name": "apple"}, {"name": "cherry"}]}',
...   '{"fruits": [{"name": "guava"}, {"name": "grapes"}]}'
... ])
>>> bbq.json_query_array(s, "$.fruits")
0    ['{"name":"apple"}' '{"name":"cherry"}']
1    ['{"name":"guava"}' '{"name":"grapes"}']
dtype: list<item: string>[pyarrow]

>>> s = bpd.Series([
...   '{"fruits": {"color": "red",   "names": ["apple","cherry"]}}',
...   '{"fruits": {"color": "green", "names": ["guava", "grapes"]}}'
... ])
>>> bbq.json_query_array(s, "$.fruits.names")
0    ['"apple"' '"cherry"']
1    ['"guava"' '"grapes"']
dtype: list<item: string>[pyarrow]

Parameters

Name

Description

input

 bigframes.series.Series

The Series containing JSON data (as native JSON objects or JSON-formatted strings).

json_path

str

The JSON path identifying the data that you want to obtain from the input.

json_set

  json_set 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 json_path_value_pairs 
 : 
 typing 
 . 
 Sequence 
 [ 
 typing 
 . 
 Tuple 
 [ 
 str 
 , 
 typing 
 . 
 Any 
 ]], 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Produces a new JSON value within a Series by inserting or replacing values at specified paths.

Warning: The JSON-related API parse_json is in preview. Its behavior may change in future versions.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.read_gbq("SELECT JSON '{\"a\": 1}' AS data")["data"]
>>> bbq.json_set(s, json_path_value_pairs=[("$.a", 100), ("$.b", "hi")])
    0    {"a":100,"b":"hi"}
    Name: data, dtype: extension<dbjson<JSONArrowType>>[pyarrow]

Parameters

Name

Description

input

 bigframes.series.Series

The Series containing JSON data (as native JSON objects or JSON-formatted strings).

json_path_value_pairs

Sequence[Tuple[str, Any]]

Pairs of JSON path and the new value to insert/replace.

json_value

  json_value 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 json_path 
 : 
 str 
 = 
 "$" 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Extracts a JSON scalar value and converts it to a SQL STRING value. In addtion, this function:

Removes the outermost quotes and unescapes the values.
Returns a SQL NULL if a non-scalar value is selected.
Uses double quotes to escape invalid JSON_PATH characters in JSON keys.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.Series(['{"name": "Jakob", "age": "6"}', '{"name": "Jakob", "age": []}'])
>>> bbq.json_value(s, json_path="$.age")
0    6
1  <NA>
dtype: string

Parameters

Name

Description

input

 bigframes.series.Series

The Series containing JSON data (as native JSON objects or JSON-formatted strings).

json_path

str

The JSON path identifying the data that you want to obtain from the input.

json_value_array

  json_value_array 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 json_path 
 : 
 str 
 = 
 "$" 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Extracts a JSON array of scalar values and converts it to a SQL ARRAY<STRING> value. In addition, this function:

Removes the outermost quotes and unescapes the values.
Returns a SQL NULL if the selected value isn't an array or not an array containing only scalar values.
Uses double quotes to escape invalid JSON_PATH characters in JSON keys.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.Series(['[1, 2, 3]', '[4, 5]'])
>>> bbq.json_value_array(s)
0    ['1' '2' '3']
1        ['4' '5']
dtype: list<item: string>[pyarrow]

>>> s = bpd.Series([
...   '{"fruits": ["apples", "oranges", "grapes"]',
...   '{"fruits": ["guava", "grapes"]}'
... ])
>>> bbq.json_value_array(s, "$.fruits")
0    ['apples' 'oranges' 'grapes']
1               ['guava' 'grapes']
dtype: list<item: string>[pyarrow]

>>> s = bpd.Series([
...   '{"fruits": {"color": "red",   "names": ["apple","cherry"]}}',
...   '{"fruits": {"color": "green", "names": ["guava", "grapes"]}}'
... ])
>>> bbq.json_value_array(s, "$.fruits.names")
0    ['apple' 'cherry']
1    ['guava' 'grapes']
dtype: list<item: string>[pyarrow]

Parameters

Name

Description

input

 bigframes.series.Series

The Series containing JSON data (as native JSON objects or JSON-formatted strings).

json_path

str

The JSON path identifying the data that you want to obtain from the input.

parse_json

  parse_json 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Converts a series with a JSON-formatted STRING value to a JSON value.

Warning: The JSON-related API parse_json is in preview. Its behavior may change in future versions.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.Series(['{"class": {"students": [{"id": 5}, {"id": 12}]}}'])
>>> s
0    {"class": {"students": [{"id": 5}, {"id": 12}]}}
dtype: string
>>> bbq.parse_json(s)
0    {"class":{"students":[{"id":5},{"id":12}]}}
dtype: extension<dbjson<JSONArrowType>>[pyarrow]

Parameter

Name

Description

input

 bigframes.series.Series

The Series containing JSON-formatted strings).

sql_scalar

  sql_scalar 
 ( 
 sql_template 
 : 
 str 
 , 
 columns 
 : 
 typing 
 . 
 Sequence 
 [ 
 bigframes 
 . 
 series 
 . 
 Series 
 ] 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Create a Series from a SQL template.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.Series(["1.5", "2.5", "3.5"])
>>> s = s.astype(pd.ArrowDtype(pa.decimal128(38, 9)))
>>> bbq.sql_scalar("ROUND({0}, 0, 'ROUND_HALF_EVEN')", [s])
0    2.000000000
1    2.000000000
2    4.000000000
dtype: decimal128(38, 9)[pyarrow]

Parameters

Name

Description

sql_template

str

A SQL format string with Python-style {0} placeholders for each of the Series objects in columns .

columns

Sequence[ bigframes.pandas.Series 
]

Series objects representing the column inputs to the sql_template . Must contain at least one Series.

st_area

  st_area 
 ( 
 series 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 bigframes 
 . 
 geopandas 
 . 
 geoseries 
 . 
 GeoSeries 
 ], 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Returns the area in square meters covered by the polygons in the input GEOGRAPHY .

If geography_expression is a point or a line, returns zero. If geography_expression is a collection, returns the area of the polygons in the collection; if the collection doesn't contain polygons, returns zero.

Note: BigQuery's Geography functions, like st_area , interpret the geometry data type as a point set on the Earth's surface. A point set is a set of points, lines, and polygons on the WGS84 reference spheroid, with geodesic edges. See: https://cloud.google.com/bigquery/docs/geospatial-data

Examples:

 >>> import bigframes.geopandas
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> from shapely.geometry import Polygon, LineString, Point

>>> series = bigframes.geopandas.GeoSeries(
...         [
...             Polygon([(0.0, 0.0), (0.1, 0.1), (0.0, 0.1)]),
...             Polygon([(0.10, 0.4), (0.9, 0.5), (0.10, 0.5)]),
...             Polygon([(0.1, 0.1), (0.2, 0.1), (0.2, 0.2)]),
...             LineString([(0, 0), (1, 1), (0, 1)]),
...             Point(0, 1),
...         ]
... )
>>> series
0              POLYGON ((0 0, 0.1 0.1, 0 0.1, 0 0))
1    POLYGON ((0.1 0.4, 0.9 0.5, 0.1 0.5, 0.1 0.4))
2    POLYGON ((0.1 0.1, 0.2 0.1, 0.2 0.2, 0.1 0.1))
3                        LINESTRING (0 0, 1 1, 0 1)
4                                       POINT (0 1)
dtype: geometry

>>> bbq.st_area(series)
0    61821689.855985
1    494563347.88721
2    61821689.855841
3                0.0
4                0.0
dtype: Float64

Use round() to round the outputed areas to the neares ten millions

 >>> bbq.st_area(series).round(-7)
0     60000000.0
1    490000000.0
2     60000000.0
3            0.0
4            0.0
dtype: Float64

Parameter

Name

Description

series

 bigframes.pandas.Series 
 bigframes.geopandas.GeoSeries

A series containing geography objects.

st_buffer

  st_buffer 
 ( 
 series 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 bigframes 
 . 
 geopandas 
 . 
 geoseries 
 . 
 GeoSeries 
 ], 
 buffer_radius 
 : 
 float 
 , 
 num_seg_quarter_circle 
 : 
 float 
 = 
 8.0 
 , 
 use_spheroid 
 : 
 bool 
 = 
 False 
 , 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Computes a GEOGRAPHY that represents all points whose distance from the input GEOGRAPHY is less than or equal to distance meters.

Note: BigQuery's Geography functions, like st_buffer , interpret the geometry data type as a point set on the Earth's surface. A point set is a set of points, lines, and polygons on the WGS84 reference spheroid, with geodesic edges. See: https://cloud.google.com/bigquery/docs/geospatial-data

Examples:

 >>> import bigframes.geopandas
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> from shapely.geometry import Point

>>> series = bigframes.geopandas.GeoSeries(
...         [
...             Point(0, 0),
...             Point(1, 1),
...         ]
... )
>>> series
0    POINT (0 0)
1    POINT (1 1)
dtype: geometry

>>> buffer = bbq.st_buffer(series, 100)
>>> bbq.st_area(buffer) > 0
0    True
1    True
dtype: boolean

Parameters

Name

Description

series

 bigframes.pandas.Series 
 bigframes.geopandas.GeoSeries

A series containing geography objects.

buffer_radius

float

The distance in meters.

num_seg_quarter_circle

float, optional

Specifies the number of segments that are used to approximate a quarter circle. The default value is 8.0.

use_spheroid

bool, optional

Determines how this function measures distance. If use_spheroid is FALSE, the function measures distance on the surface of a perfect sphere. The use_spheroid parameter currently only supports the value FALSE. The default value of use_spheroid is FALSE.

st_centroid

  st_centroid 
 ( 
 series 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 bigframes 
 . 
 geopandas 
 . 
 geoseries 
 . 
 GeoSeries 
 ], 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Computes the geometric centroid of a GEOGRAPHY type.

For POINT and MULTIPOINT types, this is the arithmetic mean of the input coordinates. For LINESTRING and POLYGON types, this is the center of mass. For GEOMETRYCOLLECTION types, this is the center of mass of the collection's elements.

Note: BigQuery's Geography functions, like st_centroid , interpret the geometry data type as a point set on the Earth's surface. A point set is a set of points, lines, and polygons on the WGS84 reference spheroid, with geodesic edges. See: https://cloud.google.com/bigquery/docs/geospatial-data

Examples:

 >>> import bigframes.geopandas
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> from shapely.geometry import Polygon, LineString, Point

>>> series = bigframes.geopandas.GeoSeries(
...         [
...             Polygon([(0.0, 0.0), (0.1, 0.1), (0.0, 0.1)]),
...             LineString([(0, 0), (1, 1), (0, 1)]),
...             Point(0, 1),
...         ]
... )
>>> series
0              POLYGON ((0 0, 0.1 0.1, 0 0.1, 0 0))
1                        LINESTRING (0 0, 1 1, 0 1)
2                                       POINT (0 1)
dtype: geometry

>>> bbq.st_centroid(series)
0    POINT (0.03333 0.06667)
1    POINT (0.49998 0.70712)
2                  POINT (0 1)
dtype: geometry

Parameter

Name

Description

series

 bigframes.pandas.Series 
 bigframes.geopandas.GeoSeries

A series containing geography objects.

st_convexhull

  st_convexhull 
 ( 
 series 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 bigframes 
 . 
 geopandas 
 . 
 geoseries 
 . 
 GeoSeries 
 ], 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Computes the convex hull of a GEOGRAPHY type.

The convex hull is the smallest convex set that contains all of the points in the input GEOGRAPHY .

Note: BigQuery's Geography functions, like st_convexhull , interpret the geometry data type as a point set on the Earth's surface. A point set is a set of points, lines, and polygons on the WGS84 reference spheroid, with geodesic edges. See: https://cloud.google.com/bigquery/docs/geospatial-data

Examples:

 >>> import bigframes.geopandas
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> from shapely.geometry import Polygon, LineString, Point

>>> series = bigframes.geopandas.GeoSeries(
...         [
...             Polygon([(0.0, 0.0), (0.1, 0.1), (0.0, 0.1)]),
...             LineString([(0, 0), (1, 1), (0, 1)]),
...             Point(0, 1),
...         ]
... )
>>> series
0              POLYGON ((0 0, 0.1 0.1, 0 0.1, 0 0))
1                        LINESTRING (0 0, 1 1, 0 1)
2                                       POINT (0 1)
dtype: geometry

>>> bbq.st_convexhull(series)
0    POLYGON ((0 0, 0.1 0.1, 0 0.1, 0 0))
1          POLYGON ((0 0, 1 1, 0 1, 0 0))
2                                POINT (0 1)
dtype: geometry

Parameter

Name

Description

series

 bigframes.pandas.Series 
 bigframes.geopandas.GeoSeries

A series containing geography objects.

st_difference

  st_difference 
 ( 
 series 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 bigframes 
 . 
 geopandas 
 . 
 geoseries 
 . 
 GeoSeries 
 ], 
 other 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 bigframes 
 . 
 geopandas 
 . 
 geoseries 
 . 
 GeoSeries 
 , 
 shapely 
 . 
 geometry 
 . 
 base 
 . 
 BaseGeometry 
 , 
 ], 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Returns a GEOGRAPHY that represents the point set difference of geography_1 and geography_2 . Therefore, the result consists of the part of geography_1 that doesn't intersect with geography_2 .

If geometry_1 is completely contained in geometry_2 , then ST_DIFFERENCE returns an empty GEOGRAPHY .

Note: BigQuery's Geography functions, like st_difference , interpret the geometry data type as a point set on the Earth's surface. A point set is a set of points, lines, and polygons on the WGS84 reference spheroid, with geodesic edges. See: https://cloud.google.com/bigquery/docs/geospatial-data

Examples:

 >>> import bigframes as bpd
>>> import bigframes.bigquery as bbq
>>> import bigframes.geopandas
>>> from shapely.geometry import Polygon, LineString, Point

We can check two GeoSeries against each other, row by row:

 >>> s1 = bigframes.geopandas.GeoSeries(
...    [
...        Polygon([(0, 0), (2, 2), (0, 2)]),
...        Polygon([(0, 0), (2, 2), (0, 2)]),
...        LineString([(0, 0), (2, 2)]),
...        LineString([(2, 0), (0, 2)]),
...        Point(0, 1),
...    ],
... )
>>> s2 = bigframes.geopandas.GeoSeries(
...    [
...        Polygon([(0, 0), (1, 1), (0, 1)]),
...        LineString([(1, 0), (1, 3)]),
...        LineString([(2, 0), (0, 2)]),
...        Point(1, 1),
...        Point(0, 1),
...    ],
...    index=range(1, 6),
... )

>>> s1
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3             LINESTRING (2 0, 0 2)
4                       POINT (0 1)
dtype: geometry

>>> s2
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (1 0, 1 3)
3             LINESTRING (2 0, 0 2)
4                       POINT (1 1)
5                       POINT (0 1)
dtype: geometry

>>> bbq.st_difference(s1, s2)
0                                               None
1    POLYGON ((0.99954 1, 2 2, 0 2, 0 1, 0.99954 1))
2                   LINESTRING (0 0, 1 1.00046, 2 2)
3                           GEOMETRYCOLLECTION EMPTY
4                                        POINT (0 1)
5                                               None
dtype: geometry

Additionally, we can check difference of a GeoSeries against a single shapely geometry:

 >>> polygon = Polygon([(0, 0), (10, 0), (10, 10), (0, 0)])
>>> bbq.st_difference(s1, polygon)
0    POLYGON ((1.97082 2.00002, 0 2, 0 0, 1.97082 2...
1    POLYGON ((1.97082 2.00002, 0 2, 0 0, 1.97082 2...
2                             GEOMETRYCOLLECTION EMPTY
3                    LINESTRING (0.99265 1.00781, 0 2)
4                                          POINT (0 1)
dtype: geometry

Parameters

Name

Description

series

 bigframes.pandas.Series 
 bigframes.geopandas.GeoSeries

A series containing geography objects.

other

 bigframes.pandas.Series 
 bigframes.geopandas.GeoSeries 
shapely.Geometry

The series or geometric object to subtract from the geography objects in series .

st_distance

  st_distance 
 ( 
 series 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 bigframes 
 . 
 geopandas 
 . 
 geoseries 
 . 
 GeoSeries 
 ], 
 other 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 bigframes 
 . 
 geopandas 
 . 
 geoseries 
 . 
 GeoSeries 
 , 
 shapely 
 . 
 geometry 
 . 
 base 
 . 
 BaseGeometry 
 , 
 ], 
 * 
 , 
 use_spheroid 
 : 
 bool 
 = 
 False 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Returns the shortest distance in meters between two non-empty GEOGRAPHY objects.

Examples:

 >>> import bigframes as bpd
>>> import bigframes.bigquery as bbq
>>> import bigframes.geopandas
>>> from shapely.geometry import Polygon, LineString, Point

We can check two GeoSeries against each other, row by row.

 >>> s1 = bigframes.geopandas.GeoSeries(
...    [
...        Point(0, 0),
...        Point(0.00001, 0),
...        Point(0.00002, 0),
...    ],
... )
>>> s2 = bigframes.geopandas.GeoSeries(
...    [
...        Point(0.00001, 0),
...        Point(0.00003, 0),
...        Point(0.00005, 0),
...    ],
... )

>>> bbq.st_distance(s1, s2, use_spheroid=True)
0    1.113195
1     2.22639
2    3.339585
dtype: Float64

We can also calculate the distance of each geometry and a single shapely geometry:

 >>> bbq.st_distance(s2, Point(0.00001, 0))
0         0.0
1    2.223902
2    4.447804
dtype: Float64

Parameters

Name

Description

series

 bigframes.pandas.Series 
 bigframes.geopandas.GeoSeries

A series containing geography objects.

other

 bigframes.pandas.Series 
 bigframes.geopandas.GeoSeries 
shapely.Geometry

The series or geometric object to calculate the distance in meters to form the geography objects in series .

use_spheroid

optional, default False

Determines how this function measures distance. If use_spheroid is False, the function measures distance on the surface of a perfect sphere. If use_spheroid is True, the function measures distance on the surface of the WGS84 spheroid https://cloud.google.com/bigquery/docs/geospatial-data _. The default value of use_spheroid is False.

st_intersection

  st_intersection 
 ( 
 series 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 bigframes 
 . 
 geopandas 
 . 
 geoseries 
 . 
 GeoSeries 
 ], 
 other 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 bigframes 
 . 
 geopandas 
 . 
 geoseries 
 . 
 GeoSeries 
 , 
 shapely 
 . 
 geometry 
 . 
 base 
 . 
 BaseGeometry 
 , 
 ], 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Returns a GEOGRAPHY that represents the point set intersection of the two input GEOGRAPHYs . Thus, every point in the intersection appears in both geography_1 and geography_2 .

Note: BigQuery's Geography functions, like st_intersection , interpret the geometry data type as a point set on the Earth's surface. A point set is a set of points, lines, and polygons on the WGS84 reference spheroid, with geodesic edges. See: https://cloud.google.com/bigquery/docs/geospatial-data

Examples:

 >>> import bigframes as bpd
>>> import bigframes.bigquery as bbq
>>> import bigframes.geopandas
>>> from shapely.geometry import Polygon, LineString, Point

We can check two GeoSeries against each other, row by row.

 >>> s1 = bigframes.geopandas.GeoSeries(
...    [
...        Polygon([(0, 0), (2, 2), (0, 2)]),
...        Polygon([(0, 0), (2, 2), (0, 2)]),
...        LineString([(0, 0), (2, 2)]),
...        LineString([(2, 0), (0, 2)]),
...        Point(0, 1),
...    ],
... )
>>> s2 = bigframes.geopandas.GeoSeries(
...    [
...        Polygon([(0, 0), (1, 1), (0, 1)]),
...        LineString([(1, 0), (1, 3)]),
...        LineString([(2, 0), (0, 2)]),
...        Point(1, 1),
...        Point(0, 1),
...    ],
...    index=range(1, 6),
... )

>>> s1
0    POLYGON ((0 0, 2 2, 0 2, 0 0))
1    POLYGON ((0 0, 2 2, 0 2, 0 0))
2             LINESTRING (0 0, 2 2)
3             LINESTRING (2 0, 0 2)
4                       POINT (0 1)
dtype: geometry

>>> s2
1    POLYGON ((0 0, 1 1, 0 1, 0 0))
2             LINESTRING (1 0, 1 3)
3             LINESTRING (2 0, 0 2)
4                       POINT (1 1)
5                       POINT (0 1)
dtype: geometry

>>> bbq.st_intersection(s1, s2)
0                                    None
1    POLYGON ((0 0, 0.99954 1, 0 1, 0 0))
2                       POINT (1 1.00046)
3                   LINESTRING (2 0, 0 2)
4                GEOMETRYCOLLECTION EMPTY
5                                    None
dtype: geometry

We can also do intersection of each geometry and a single shapely geometry:

 >>> bbq.st_intersection(s1, Polygon([(0, 0), (1, 1), (0, 1)]))
0    POLYGON ((0 0, 0.99954 1, 0 1, 0 0))
1    POLYGON ((0 0, 0.99954 1, 0 1, 0 0))
2             LINESTRING (0 0, 0.99954 1)
3                GEOMETRYCOLLECTION EMPTY
4                             POINT (0 1)
dtype: geometry

Parameters

Name

Description

series

 bigframes.pandas.Series 
 bigframes.geopandas.GeoSeries

A series containing geography objects.

other

 bigframes.pandas.Series 
 bigframes.geopandas.GeoSeries 
shapely.Geometry

The series or geometric object to intersect with the geography objects in series .

st_isclosed

  st_isclosed 
 ( 
 series 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 bigframes 
 . 
 geopandas 
 . 
 geoseries 
 . 
 GeoSeries 
 ], 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Returns TRUE for a non-empty Geography, where each element in the Geography has an empty boundary.

Note: BigQuery's Geography functions, like st_isclosed , interpret the geometry data type as a point set on the Earth's surface. A point set is a set of points, lines, and polygons on the WGS84 reference spheroid, with geodesic edges. See: https://cloud.google.com/bigquery/docs/geospatial-data

Examples:

 >>> import bigframes.geopandas
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> from shapely.geometry import Point, LineString, Polygon

>>> series = bigframes.geopandas.GeoSeries(
...     [
...         Point(0, 0),  # Point
...         LineString([(0, 0), (1, 1)]),  # Open LineString
...         LineString([(0, 0), (1, 1), (0, 1), (0, 0)]),  # Closed LineString
...         Polygon([(0, 0), (1, 1), (0, 1), (0, 0)]),
...         None,
...     ]
... )
>>> series
0                                       POINT (0 0)
1                            LINESTRING (0 0, 1 1)
2             LINESTRING (0 0, 1 1, 0 1, 0 0)
3             POLYGON ((0 0, 1 1, 0 1, 0 0))
4                                           None
dtype: geometry

>>> bbq.st_isclosed(series)
0     True
1    False
2     True
3     False
4     <NA>
dtype: boolean

Parameter

Name

Description

series

 bigframes.pandas.Series 
 bigframes.geopandas.GeoSeries

A series containing geography objects.

st_length

  st_length 
 ( 
 series 
 : 
 typing 
 . 
 Union 
 [ 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 bigframes 
 . 
 geopandas 
 . 
 geoseries 
 . 
 GeoSeries 
 ], 
 * 
 , 
 use_spheroid 
 : 
 bool 
 = 
 False 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Returns the total length in meters of the lines in the input GEOGRAPHY.

If a series element is a point or a polygon, returns zero for that row. If a series element is a collection, returns the length of the lines in the collection; if the collection doesn't contain lines, returns zero.

The optional use_spheroid parameter determines how this function measures distance. If use_spheroid is FALSE, the function measures distance on the surface of a perfect sphere.

The use_spheroid parameter currently only supports the value FALSE. The default value of use_spheroid is FALSE. See: https://cloud.google.com/bigquery/docs/reference/standard-sql/geography_functions#st_length

Examples:

 >>> import bigframes.geopandas
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> from shapely.geometry import Polygon, LineString, Point, GeometryCollection

>>> series = bigframes.geopandas.GeoSeries(
...         [
...             LineString([(0, 0), (1, 0)]),  # Length will be approx 1 degree in meters
...             Polygon([(0.0, 0.0), (0.1, 0.1), (0.0, 0.1)]), # Length is 0
...             Point(0, 1),  # Length is 0
...             GeometryCollection([LineString([(0,0),(0,1)]), Point(1,1)]) # Length of LineString only
...         ]
... )

>>> result = bbq.st_length(series)
>>> result
0    111195.101177
1              0.0
2              0.0
3    111195.101177
dtype: Float64

Parameters

Name

Description

series

 bigframes.series.Series 
 bigframes.geopandas.GeoSeries

A series containing geography objects.

use_spheroid

bool, optional

Determines how this function measures distance. If FALSE (default), measures distance on a perfect sphere. Currently, only FALSE is supported.

st_simplify

  st_simplify 
 ( 
 geography 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 tolerance_meters 
 : 
 float 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Returns a simplified version of the input geography.

Parameters

Name

Description

geography

 bigframes.series.Series

A Series containing GEOGRAPHY data.

tolerance_meters

float

A float64 value indicating the tolerance in meters.

struct

  struct 
 ( 
 value 
 : 
 dataframe 
 . 
 DataFrame 
 ) 
 - 
> series 
 . 
 Series

Takes a DataFrame and converts it into a Series of structs with each struct entry corresponding to a DataFrame row and each struct field corresponding to a DataFrame column

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import bigframes.series as series

>>> srs = series.Series([{"version": 1, "project": "pandas"}, {"version": 2, "project": "numpy"},])
>>> df = srs.struct.explode()
>>> bbq.struct(df)
0    {'project': 'pandas', 'version': 1}
1     {'project': 'numpy', 'version': 2}
dtype: struct<project: string, version: int64>[pyarrow]

Args:
    value (bigframes.dataframe.DataFrame):
        The DataFrame to be converted to a Series of structs

Returns:
    bigframes.series.Series: A new Series with struct entries representing rows of the original DataFrame

to_json

  to_json 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Converts a series with a JSON value to a JSON-formatted STRING value.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.Series([1, 2, 3])
>>> bbq.to_json(s)
0    1
1    2
2    3
dtype: extension<dbjson<JSONArrowType>>[pyarrow]

>>> s = bpd.Series([{"int": 1, "str": "pandas"}, {"int": 2, "str": "numpy"}])
>>> bbq.to_json(s)
0    {"int":1,"str":"pandas"}
1     {"int":2,"str":"numpy"}
dtype: extension<dbjson<JSONArrowType>>[pyarrow]

Parameter

Name

Description

input

 bigframes.series.Series

The Series containing JSON or JSON-formatted string values.

to_json_string

  to_json_string 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Converts a series to a JSON-formatted STRING value.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.Series([1, 2, 3])
>>> bbq.to_json_string(s)
0    1
1    2
2    3
dtype: string

>>> s = bpd.Series([{"int": 1, "str": "pandas"}, {"int": 2, "str": "numpy"}])
>>> bbq.to_json_string(s)
0    {"int":1,"str":"pandas"}
1     {"int":2,"str":"numpy"}
dtype: string

Parameter

Name

Description

input

 bigframes.series.Series

The Series to be converted.

unix_micros

  unix_micros 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Converts a timestmap series to unix epoch microseconds

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.Series([pd.Timestamp("1970-01-02", tz="UTC"), pd.Timestamp("1970-01-03", tz="UTC")])
>>> bbq.unix_micros(s)
0     86400000000
1    172800000000
dtype: Int64

Parameter

Name

Description

input

 bigframes.pandas.Series

A timestamp series.

unix_millis

  unix_millis 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Converts a timestmap series to unix epoch milliseconds

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.Series([pd.Timestamp("1970-01-02", tz="UTC"), pd.Timestamp("1970-01-03", tz="UTC")])
>>> bbq.unix_millis(s)
0     86400000
1    172800000
dtype: Int64

Parameter

Name

Description

input

 bigframes.pandas.Series

A timestamp series.

unix_seconds

  unix_seconds 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series

Converts a timestmap series to unix epoch seconds

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

>>> s = bpd.Series([pd.Timestamp("1970-01-02", tz="UTC"), pd.Timestamp("1970-01-03", tz="UTC")])
>>> bbq.unix_seconds(s)
0     86400
1    172800
dtype: Int64

Parameter

Name

Description

input

 bigframes.pandas.Series

A timestamp series.

vector_search

  vector_search 
 ( 
 base_table 
 : 
 str 
 , 
 column_to_search 
 : 
 str 
 , 
 query 
 : 
 Union 
 [ 
 dataframe 
 . 
 DataFrame 
 , 
 series 
 . 
 Series 
 ], 
 * 
 , 
 query_column_to_search 
 : 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 , 
 top_k 
 : 
 Optional 
 [ 
 int 
 ] 
 = 
 None 
 , 
 distance_type 
 : 
 Optional 
 [ 
 Literal 
 [ 
 "euclidean" 
 , 
 "cosine" 
 , 
 "dot_product" 
 ]] 
 = 
 None 
 , 
 fraction_lists_to_search 
 : 
 Optional 
 [ 
 float 
 ] 
 = 
 None 
 , 
 use_brute_force 
 : 
 Optional 
 [ 
 bool 
 ] 
 = 
 None 
 , 
 allow_large_results 
 : 
 Optional 
 [ 
 bool 
 ] 
 = 
 None 
 ) 
 - 
> dataframe 
 . 
 DataFrame

Conduct vector search which searches embeddings to find semantically similar entities.

This method calls the VECTOR_SEARCH() SQL function <https://cloud.google.com/bigquery/docs/reference/standard-sql/search_functions#vector_search> _.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq

DataFrame embeddings for which to find nearest neighbors. The ARRAY<FLOAT64> column is used as the search query:

 >>> search_query = bpd.DataFrame({"query_id": ["dog", "cat"],
...                               "embedding": [[1.0, 2.0], [3.0, 5.2]]})
>>> bbq.vector_search(
...             base_table="bigframes-dev.bigframes_tests_sys.base_table",
...             column_to_search="my_embedding",
...             query=search_query,
...             top_k=2).sort_values("id")
  query_id  embedding  id my_embedding  distance
0      dog    [1. 2.]   1      [1. 2.]       0.0
1      cat  [3.  5.2]   2      [2. 4.]   1.56205
0      dog    [1. 2.]   4    [1.  3.2]       1.2
1      cat  [3.  5.2]   5    [5.  5.4]  2.009975
<BLANKLINE>
[4 rows x 5 columns]

Series embeddings for which to find nearest neighbors:

 >>> search_query = bpd.Series([[1.0, 2.0], [3.0, 5.2]],
...                            index=["dog", "cat"],
...                            name="embedding")
>>> bbq.vector_search(
...             base_table="bigframes-dev.bigframes_tests_sys.base_table",
...             column_to_search="my_embedding",
...             query=search_query,
...             top_k=2,
...             use_brute_force=True).sort_values("id")
     embedding  id my_embedding  distance
dog    [1. 2.]   1      [1. 2.]       0.0
cat  [3.  5.2]   2      [2. 4.]   1.56205
dog    [1. 2.]   4    [1.  3.2]       1.2
cat  [3.  5.2]   5    [5.  5.4]  2.009975
<BLANKLINE>
[4 rows x 4 columns]

You can specify the name of the column in the query DataFrame embeddings and distance type. If you specify query_column_to_search_value, it will use the provided column which contains the embeddings for which to find nearest neighbors. Otherwiese, it uses the column_to_search value.

 >>> search_query = bpd.DataFrame({"query_id": ["dog", "cat"],
...                               "embedding": [[1.0, 2.0], [3.0, 5.2]],
...                               "another_embedding": [[0.7, 2.2], [3.3, 5.2]]})
>>> bbq.vector_search(
...             base_table="bigframes-dev.bigframes_tests_sys.base_table",
...             column_to_search="my_embedding",
...             query=search_query,
...             distance_type="cosine",
...             query_column_to_search="another_embedding",
...             top_k=2).sort_values("id")
  query_id  embedding another_embedding  id my_embedding  distance
1      cat  [3.  5.2]         [3.3 5.2]   1      [1. 2.]  0.005181
1      cat  [3.  5.2]         [3.3 5.2]   2      [2. 4.]  0.005181
0      dog    [1. 2.]         [0.7 2.2]   3    [1.5 7. ]  0.004697
0      dog    [1. 2.]         [0.7 2.2]   4    [1.  3.2]  0.000013
<BLANKLINE>
[4 rows x 6 columns]

Parameters

Name

Description

base_table

str

The table to search for nearest neighbor embeddings.

column_to_search

str

The name of the base table column to search for nearest neighbor embeddings. The column must have a type of ARRAY . All elements in the array must be non-NULL.

query

 bigframes.dataframe.DataFrame 
bigframes.dataframe.Series

A Series or DataFrame that provides the embeddings for which to find nearest neighbors.

query_column_to_search

str

Specifies the name of the column in the query that contains the embeddings for which to find nearest neighbors. The column must have a type of ARRAY . All elements in the array must be non-NULL and all values in the column must have the same array dimensions as the values in the column_to_search column. Can only be set when query is a DataFrame.

top_k

int

Sepecifies the number of nearest neighbors to return. Default to 10.

distance_type

str, defalt "euclidean"

Specifies the type of metric to use to compute the distance between two vectors. Possible values are "euclidean", "cosine" and "dot_product". Default to "euclidean".

fraction_lists_to_search

float, range in [0.0, 1.0]

Specifies the percentage of lists to search. Specifying a higher percentage leads to higher recall and slower performance, and the converse is true when specifying a lower percentage. It is only used when a vector index is also used. You can only specify fraction_lists_to_search when use_brute_force is set to False.

use_brute_force

bool

Determines whether to use brute force search by skipping the vector index if one is available. Default to False.

allow_large_results

bool, optional

Whether to allow large query results. If True , the query results can be larger than the maximum response size. Defaults to bpd.options.compute.allow_large_results .

Package bigquery (2.28.0) Stay organized with collections Save and categorize content based on your preferences.

Packages Functions

approx_top_count

array_agg

array_length

array_to_string

create_vector_index

json_extract

json_extract_array

json_extract_string_array

json_query

json_query_array

json_set

json_value

json_value_array

parse_json

sql_scalar

st_area

st_buffer

st_centroid

st_convexhull

st_difference

st_distance

st_intersection

st_isclosed

st_length

st_simplify

struct

to_json

to_json_string

unix_micros

unix_millis

unix_seconds

vector_search

Package bigquery (2.28.0)