Package bigquery (1.34.0)

API documentation for bigquery package.

Packages Functions

approx_top_count

  approx_top_count 
 ( 
 series 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 number 
 : 
 int 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series 
 

Returns the approximate top elements of expression as an array of STRUCTs. The number parameter specifies the number of elements returned.

Each STRUCT contains two fields. The first field (named value ) contains an input value. The second field (named count ) contains an INT64 specifying the number of times the value was returned.

Returns NULL if there are zero input rows.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None
>>> s = bpd.Series(["apple", "apple", "pear", "pear", "pear", "banana"])
>>> bbq.approx_top_count(s, number=2)
[{'value': 'pear', 'count': 3}, {'value': 'apple', 'count': 2}] 
Parameters
Name
Description
series

The Series with any data type that the GROUP BY clause supports.

number

An integer specifying the number of times the value was returned.

array_agg

  array_agg 
 ( 
 obj 
 : 
 groupby 
 . 
 SeriesGroupBy 
 | 
 groupby 
 . 
 DataFrameGroupBy 
 , 
 ) 
 - 
> series 
 . 
 Series 
 | 
 dataframe 
 . 
 DataFrame 
 

Group data and create arrays from selected columns, omitting NULLs to avoid BigQuery errors (NULLs not allowed in arrays).

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import numpy as np
>>> bpd.options.display.progress_bar = None 

For a SeriesGroupBy object:

 >>> lst = ['a', 'a', 'b', 'b', 'a']
>>> s = bpd.Series([1, 2, 3, 4, np.nan], index=lst)
>>> bbq.array_agg(s.groupby(level=0))
a    [1. 2.]
b    [3. 4.]
dtype: list<item: double>[pyarrow] 

For a DataFrameGroupBy object:

 >>> l = [[1, 2, 3], [1, None, 4], [2, 1, 3], [1, 2, 2]]
>>> df = bpd.DataFrame(l, columns=["a", "b", "c"])
>>> bbq.array_agg(df.groupby(by=["b"]))
         a      c
b
1.0    [2]    [3]
2.0  [1 1]  [3 2]
<BLANKLINE>
[2 rows x 2 columns] 
Parameter
Name
Description
obj

A GroupBy object to be applied the function.

array_length

  array_length 
 ( 
 series 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series 
 

Compute the length of each array element in the Series.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None

>>> s = bpd.Series([[1, 2, 8, 3], [], [3, 4]])
>>> bbq.array_length(s)
0    4
1    0
2    2
dtype: Int64 

You can also apply this function directly to Series.

 >>> s.apply(bbq.array_length, by_row=False)
0    4
1    0
2    2
dtype: Int64 
Parameter
Name
Description
series

A Series with array columns.

array_to_string

  array_to_string 
 ( 
 series 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 delimiter 
 : 
 str 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series 
 

Converts array elements within a Series into delimited strings.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> s = bpd.Series([["H", "i", "!"], ["Hello", "World"], np.nan, [], ["Hi"]])
>>> bbq.array_to_string(s, delimiter=", ")
    0         H, i, !
    1    Hello, World
    2
    3
    4              Hi
    dtype: string 
Parameters
Name
Description
series

A Series containing arrays.

delimiter

The string used to separate array elements.

create_vector_index

  create_vector_index 
 ( 
 table_id 
 : 
 str 
 , 
 column_name 
 : 
 str 
 , 
 * 
 , 
 replace 
 : 
 bool 
 = 
 False 
 , 
 index_name 
 : 
 typing 
 . 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 , 
 distance_type 
 = 
 "cosine" 
 , 
 stored_column_names 
 : 
 typing 
 . 
 Collection 
 [ 
 str 
 ] 
 = 
 (), 
 index_type 
 : 
 str 
 = 
 "ivf" 
 , 
 ivf_options 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Mapping 
 ] 
 = 
 None 
 , 
 tree_ah_options 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Mapping 
 ] 
 = 
 None 
 , 
 session 
 : 
 typing 
 . 
 Optional 
 [ 
 bigframes 
 . 
 session 
 . 
 Session 
 ] 
 = 
 None 
 ) 
 - 
> None 
 

Creates a new vector index on a column of a table.

This method calls the CREATE VECTOR INDEX DDL statement <https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_vector_index_statement> _.

json_extract

  json_extract 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 json_path 
 : 
 str 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series 
 

Extracts a JSON value and converts it to a SQL JSON-formatted STRING or JSON value. This function uses single quotes and brackets to escape invalid JSONPath characters in JSON keys.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None

>>> s = bpd.Series(['{"class": {"students": [{"id": 5}, {"id": 12}]}}'])
>>> bbq.json_extract(s, json_path="$.class")
0    {"students":[{"id":5},{"id":12}]}
dtype: string 
Parameters
Name
Description
input

The Series containing JSON data (as native JSON objects or JSON-formatted strings).

json_path

The JSON path identifying the data that you want to obtain from the input.

json_extract_array

  json_extract_array 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 json_path 
 : 
 str 
 = 
 "$" 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series 
 

Extracts a JSON array and converts it to a SQL array of JSON-formatted STRING or JSON values. This function uses single quotes and brackets to escape invalid JSONPath characters in JSON keys.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None

>>> s = bpd.Series(['[1, 2, 3]', '[4, 5]'])
>>> bbq.json_extract_array(s)
0    ['1' '2' '3']
1        ['4' '5']
dtype: list<item: string>[pyarrow]

>>> s = bpd.Series([
...   '{"fruits": [{"name": "apple"}, {"name": "cherry"}]}',
...   '{"fruits": [{"name": "guava"}, {"name": "grapes"}]}'
... ])
>>> bbq.json_extract_array(s, "$.fruits")
0    ['{"name":"apple"}' '{"name":"cherry"}']
1    ['{"name":"guava"}' '{"name":"grapes"}']
dtype: list<item: string>[pyarrow]

>>> s = bpd.Series([
...   '{"fruits": {"color": "red",   "names": ["apple","cherry"]}}',
...   '{"fruits": {"color": "green", "names": ["guava", "grapes"]}}'
... ])
>>> bbq.json_extract_array(s, "$.fruits.names")
0    ['"apple"' '"cherry"']
1    ['"guava"' '"grapes"']
dtype: list<item: string>[pyarrow] 
Parameters
Name
Description
input

The Series containing JSON data (as native JSON objects or JSON-formatted strings).

json_path

The JSON path identifying the data that you want to obtain from the input.

json_extract_string_array

  json_extract_string_array 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 json_path 
 : 
 str 
 = 
 "$" 
 , 
 value_dtype 
 : 
 typing 
 . 
 Optional 
 [ 
 typing 
 . 
 Union 
 [ 
 pandas 
 . 
 core 
 . 
 arrays 
 . 
 boolean 
 . 
 BooleanDtype 
 , 
 pandas 
 . 
 core 
 . 
 arrays 
 . 
 floating 
 . 
 Float64Dtype 
 , 
 pandas 
 . 
 core 
 . 
 arrays 
 . 
 integer 
 . 
 Int64Dtype 
 , 
 pandas 
 . 
 core 
 . 
 arrays 
 . 
 string_ 
 . 
 StringDtype 
 , 
 pandas 
 . 
 core 
 . 
 dtypes 
 . 
 dtypes 
 . 
 ArrowDtype 
 , 
 geopandas 
 . 
 array 
 . 
 GeometryDtype 
 , 
 typing 
 . 
 Literal 
 [ 
 "boolean" 
 , 
 "Float64" 
 , 
 "Int64" 
 , 
 "int64[pyarrow]" 
 , 
 "string" 
 , 
 "string[pyarrow]" 
 , 
 "timestamp[us, tz=UTC][pyarrow]" 
 , 
 "timestamp[us][pyarrow]" 
 , 
 "date32[day][pyarrow]" 
 , 
 "time64[us][pyarrow]" 
 , 
 "decimal128(38, 9)[pyarrow]" 
 , 
 "decimal256(76, 38)[pyarrow]" 
 , 
 "binary[pyarrow]" 
 , 
 ], 
 ] 
 ] 
 = 
 None 
 , 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series 
 

Extracts a JSON array and converts it to a SQL array of STRING values. A value_dtype can be provided to further coerce the data type of the values in the array. This function uses single quotes and brackets to escape invalid JSONPath characters in JSON keys.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None

>>> s = bpd.Series(['[1, 2, 3]', '[4, 5]'])
>>> bbq.json_extract_string_array(s)
0    ['1' '2' '3']
1        ['4' '5']
dtype: list<item: string>[pyarrow]

>>> bbq.json_extract_string_array(s, value_dtype='Int64')
0    [1 2 3]
1      [4 5]
dtype: list<item: int64>[pyarrow]

>>> s = bpd.Series([
...   '{"fruits": {"color": "red",   "names": ["apple","cherry"]}}',
...   '{"fruits": {"color": "green", "names": ["guava", "grapes"]}}'
... ])
>>> bbq.json_extract_string_array(s, "$.fruits.names")
0    ['apple' 'cherry']
1    ['guava' 'grapes']
dtype: list<item: string>[pyarrow] 
Parameters
Name
Description
input

The Series containing JSON data (as native JSON objects or JSON-formatted strings).

json_path

The JSON path identifying the data that you want to obtain from the input.

value_dtype

The data type supported by BigFrames DataFrame.

json_set

  json_set 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 , 
 json_path_value_pairs 
 : 
 typing 
 . 
 Sequence 
 [ 
 typing 
 . 
 Tuple 
 [ 
 str 
 , 
 typing 
 . 
 Any 
 ]], 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series 
 

Produces a new JSON value within a Series by inserting or replacing values at specified paths.

Examples:
 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import numpy as np
>>> bpd.options.display.progress_bar = None

>>> s = bpd.read_gbq("SELECT JSON '{\"a\": 1}' AS data")["data"]
>>> bbq.json_set(s, json_path_value_pairs=[("$.a", 100), ("$.b", "hi")])
    0    {"a":100,"b":"hi"}
    Name: data, dtype: dbjson 
Parameters
Name
Description
input

The Series containing JSON data (as native JSON objects or JSON-formatted strings).

json_path_value_pairs

Pairs of JSON path and the new value to insert/replace.

parse_json

  parse_json 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series 
 

Converts a series with a JSON-formatted STRING value to a JSON value.

Examples:
 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None

>>> s = bpd.Series(['{"class": {"students": [{"id": 5}, {"id": 12}]}}'])
>>> s
0    {"class": {"students": [{"id": 5}, {"id": 12}]}}
dtype: string
>>> bbq.parse_json(s)
0    {"class":{"students":[{"id":5},{"id":12}]}}
dtype: dbjson 
Parameter
Name
Description
input

The Series containing JSON-formatted strings).

sql_scalar

  sql_scalar 
 ( 
 sql_template 
 : 
 str 
 , 
 columns 
 : 
 typing 
 . 
 Sequence 
 [ 
 bigframes 
 . 
 series 
 . 
 Series 
 ] 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series 
 

Create a Series from a SQL template.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import pandas as pd
>>> import pyarrow as pa
>>> bpd.options.display.progress_bar = None

>>> s = bpd.Series(["1.5", "2.5", "3.5"])
>>> s = s.astype(pd.ArrowDtype(pa.decimal128(38, 9)))
>>> bbq.sql_scalar("ROUND({0}, 0, 'ROUND_HALF_EVEN')", [s])
0    2.000000000
1    2.000000000
2    4.000000000
dtype: decimal128(38, 9)[pyarrow] 
Parameters
Name
Description
sql_template

A SQL format string with Python-style {0} placeholders for each of the Series objects in columns .

columns

Series objects representing the column inputs to the sql_template . Must contain at least one Series.

struct

  struct 
 ( 
 value 
 : 
 dataframe 
 . 
 DataFrame 
 ) 
 - 
> series 
 . 
 Series 
 

Takes a DataFrame and converts it into a Series of structs with each struct entry corresponding to a DataFrame row and each struct field corresponding to a DataFrame column

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> import bigframes.series as series
>>> bpd.options.display.progress_bar = None

>>> srs = series.Series([{"version": 1, "project": "pandas"}, {"version": 2, "project": "numpy"},])
>>> df = srs.struct.explode()
>>> bbq.struct(df)
0    {'project': 'pandas', 'version': 1}
1     {'project': 'numpy', 'version': 2}
dtype: struct<project: string, version: int64>[pyarrow]

Args:
    value (bigframes.dataframe.DataFrame):
        The DataFrame to be converted to a Series of structs

Returns:
    bigframes.series.Series: A new Series with struct entries representing rows of the original DataFrame 

unix_micros

  unix_micros 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series 
 

Converts a timestmap series to unix epoch microseconds

Examples:

 >>> import pandas as pd
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None

>>> s = bpd.Series([pd.Timestamp("1970-01-02", tz="UTC"), pd.Timestamp("1970-01-03", tz="UTC")])
>>> bbq.unix_micros(s)
0     86400000000
1    172800000000
dtype: Int64 
Parameter
Name
Description
input

A timestamp series.

unix_millis

  unix_millis 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series 
 

Converts a timestmap series to unix epoch milliseconds

Examples:

 >>> import pandas as pd
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None

>>> s = bpd.Series([pd.Timestamp("1970-01-02", tz="UTC"), pd.Timestamp("1970-01-03", tz="UTC")])
>>> bbq.unix_millis(s)
0     86400000
1    172800000
dtype: Int64 
Parameter
Name
Description
input

A timestamp series.

unix_seconds

  unix_seconds 
 ( 
 input 
 : 
 bigframes 
 . 
 series 
 . 
 Series 
 ) 
 - 
> bigframes 
 . 
 series 
 . 
 Series 
 

Converts a timestmap series to unix epoch seconds

Examples:

 >>> import pandas as pd
>>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None

>>> s = bpd.Series([pd.Timestamp("1970-01-02", tz="UTC"), pd.Timestamp("1970-01-03", tz="UTC")])
>>> bbq.unix_seconds(s)
0     86400
1    172800
dtype: Int64 
Parameter
Name
Description
input

A timestamp series.

  vector_search 
 ( 
 base_table 
 : 
 str 
 , 
 column_to_search 
 : 
 str 
 , 
 query 
 : 
 Union 
 [ 
 dataframe 
 . 
 DataFrame 
 , 
 series 
 . 
 Series 
 ], 
 * 
 , 
 query_column_to_search 
 : 
 Optional 
 [ 
 str 
 ] 
 = 
 None 
 , 
 top_k 
 : 
 Optional 
 [ 
 int 
 ] 
 = 
 None 
 , 
 distance_type 
 : 
 Optional 
 [ 
 Literal 
 [ 
 "euclidean" 
 , 
 "cosine" 
 , 
 "dot_product" 
 ]] 
 = 
 None 
 , 
 fraction_lists_to_search 
 : 
 Optional 
 [ 
 float 
 ] 
 = 
 None 
 , 
 use_brute_force 
 : 
 Optional 
 [ 
 bool 
 ] 
 = 
 None 
 ) 
 - 
> dataframe 
 . 
 DataFrame 
 

Conduct vector search which searches embeddings to find semantically similar entities.

This method calls the VECTOR_SEARCH() SQL function <https://cloud.google.com/bigquery/docs/reference/standard-sql/search_functions#vector_search> _.

Examples:

 >>> import bigframes.pandas as bpd
>>> import bigframes.bigquery as bbq
>>> bpd.options.display.progress_bar = None 

DataFrame embeddings for which to find nearest neighbors. The ARRAY<FLOAT64> column is used as the search query:

 >>> search_query = bpd.DataFrame({"query_id": ["dog", "cat"],
...                               "embedding": [[1.0, 2.0], [3.0, 5.2]]})
>>> bbq.vector_search(
...             base_table="bigframes-dev.bigframes_tests_sys.base_table",
...             column_to_search="my_embedding",
...             query=search_query,
...             top_k=2).sort_values("id")
  query_id  embedding  id my_embedding  distance
0      dog    [1. 2.]   1      [1. 2.]       0.0
1      cat  [3.  5.2]   2      [2. 4.]   1.56205
0      dog    [1. 2.]   4    [1.  3.2]       1.2
1      cat  [3.  5.2]   5    [5.  5.4]  2.009975
<BLANKLINE>
[4 rows x 5 columns] 

Series embeddings for which to find nearest neighbors:

 >>> search_query = bpd.Series([[1.0, 2.0], [3.0, 5.2]],
...                            index=["dog", "cat"],
...                            name="embedding")
>>> bbq.vector_search(
...             base_table="bigframes-dev.bigframes_tests_sys.base_table",
...             column_to_search="my_embedding",
...             query=search_query,
...             top_k=2,
...             use_brute_force=True).sort_values("id")
     embedding  id my_embedding  distance
dog    [1. 2.]   1      [1. 2.]       0.0
cat  [3.  5.2]   2      [2. 4.]   1.56205
dog    [1. 2.]   4    [1.  3.2]       1.2
cat  [3.  5.2]   5    [5.  5.4]  2.009975
<BLANKLINE>
[4 rows x 4 columns] 

You can specify the name of the column in the query DataFrame embeddings and distance type. If you specify query_column_to_search_value, it will use the provided column which contains the embeddings for which to find nearest neighbors. Otherwiese, it uses the column_to_search value.

 >>> search_query = bpd.DataFrame({"query_id": ["dog", "cat"],
...                               "embedding": [[1.0, 2.0], [3.0, 5.2]],
...                               "another_embedding": [[0.7, 2.2], [3.3, 5.2]]})
>>> bbq.vector_search(
...             base_table="bigframes-dev.bigframes_tests_sys.base_table",
...             column_to_search="my_embedding",
...             query=search_query,
...             distance_type="cosine",
...             query_column_to_search="another_embedding",
...             top_k=2)
  query_id  embedding another_embedding  id my_embedding  distance
1      cat  [3.  5.2]         [3.3 5.2]   2      [2. 4.]  0.005181
0      dog    [1. 2.]         [0.7 2.2]   4    [1.  3.2]  0.000013
1      cat  [3.  5.2]         [3.3 5.2]   1      [1. 2.]  0.005181
0      dog    [1. 2.]         [0.7 2.2]   3    [1.5 7. ]  0.004697
<BLANKLINE>
[4 rows x 6 columns] 
Parameters
Name
Description
base_table

The table to search for nearest neighbor embeddings.

column_to_search

The name of the base table column to search for nearest neighbor embeddings. The column must have a type of ARRAY . All elements in the array must be non-NULL.

query

A Series or DataFrame that provides the embeddings for which to find nearest neighbors.

query_column_to_search

Specifies the name of the column in the query that contains the embeddings for which to find nearest neighbors. The column must have a type of ARRAY . All elements in the array must be non-NULL and all values in the column must have the same array dimensions as the values in the column_to_search column. Can only be set when query is a DataFrame.

top_k

Sepecifies the number of nearest neighbors to return. Default to 10.

distance_type

Specifies the type of metric to use to compute the distance between two vectors. Possible values are "euclidean", "cosine" and "dot_product". Default to "euclidean".

fraction_lists_to_search

Specifies the percentage of lists to search. Specifying a higher percentage leads to higher recall and slower performance, and the converse is true when specifying a lower percentage. It is only used when a vector index is also used. You can only specify fraction_lists_to_search when use_brute_force is set to False.

use_brute_force

Determines whether to use brute force search by skipping the vector index if one is available. Default to False.

Design a Mobile Site
View Site in Mobile | Classic
Share by: