- 2.17.0 (latest)
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 2.0.0-dev0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
Session
(
context
:
typing
.
Optional
[
bigframes
.
_config
.
bigquery_options
.
BigQueryOptions
]
=
None
,
clients_provider
:
typing
.
Optional
[
bigframes
.
session
.
clients
.
ClientsProvider
]
=
None
,
)
Establishes a BigQuery connection to capture a group of job activities related to DataFrames.
Parameters
context
bigframes._config.bigquery_options.BigQueryOptions
Configuration adjusting how to connect to BigQuery and related APIs. Note that some options are ignored if clients_provider
is set.
clients_provider
bigframes.session.bigframes.session.clients.ClientsProvider
An object providing client library objects.
Properties
bqclient
API documentation for bqclient
property.
bqconnectionclient
API documentation for bqconnectionclient
property.
bqstoragereadclient
API documentation for bqstoragereadclient
property.
cloudfunctionsclient
API documentation for cloudfunctionsclient
property.
resourcemanagerclient
API documentation for resourcemanagerclient
property.
Methods
close
close
()
No-op. Temporary resources are deleted after 7 days.
read_csv
read_csv
(
filepath_or_buffer
:
typing
.
Union
[
str
,
typing
.
IO
[
bytes
]],
*
,
sep
:
typing
.
Optional
[
str
]
=
","
,
header
:
typing
.
Optional
[
int
]
=
0
,
names
:
typing
.
Optional
[
typing
.
Union
[
typing
.
MutableSequence
[
typing
.
Any
],
numpy
.
ndarray
[
typing
.
Any
,
typing
.
Any
],
typing
.
Tuple
[
typing
.
Any
,
...
],
range
,
]
]
=
None
,
index_col
:
typing
.
Optional
[
typing
.
Union
[
int
,
str
,
typing
.
Sequence
[
typing
.
Union
[
str
,
int
]],
typing
.
Literal
[
False
]
]
]
=
None
,
usecols
:
typing
.
Optional
[
typing
.
Union
[
typing
.
MutableSequence
[
str
],
typing
.
Tuple
[
str
,
...
],
typing
.
Sequence
[
int
],
pandas
.
core
.
series
.
Series
,
pandas
.
core
.
indexes
.
base
.
Index
,
numpy
.
ndarray
[
typing
.
Any
,
typing
.
Any
],
typing
.
Callable
[[
typing
.
Any
],
bool
],
]
]
=
None
,
dtype
:
typing
.
Optional
[
typing
.
Dict
]
=
None
,
engine
:
typing
.
Optional
[
typing
.
Literal
[
"c"
,
"python"
,
"pyarrow"
,
"python-fwf"
,
"bigquery"
]
]
=
None
,
encoding
:
typing
.
Optional
[
str
]
=
None
,
**
kwargs
)
-
> bigframes
.
dataframe
.
DataFrame
Loads DataFrame from comma-separated values (csv) file locally or from Cloud Storage.
The CSV file data will be persisted as a temporary BigQuery table, which can be automatically recycled after the Session is closed.
Examples: >>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> gcs_path = "gs://cloud-samples-data/bigquery/us-states/us-states.csv"
>>> df = bpd.read_csv(filepath_or_buffer=gcs_path)
>>> df.head(2)
name post_abbr
0 Alabama AL
1 Alaska AK
<BLANKLINE>
[2 rows x 2 columns]
filepath_or_buffer
str
A local or Google Cloud Storage ( gs://
) path with engine="bigquery"
otherwise passed to pandas.read_csv.
sep
Optional[str], default ","
the separator for fields in a CSV file. For the BigQuery engine, the separator can be any ISO-8859-1 single-byte character. To use a character in the range 128-255, you must encode the character as UTF-8. Both engines support sep=" "
to specify tab character as separator. Default engine supports having any number of spaces as separator by specifying sep="\s+"
. Separators longer than 1 character are interpreted as regular expressions by the default engine. BigQuery engine only supports single character separators.
header
Optional[int], default 0
row number to use as the column names. - None
: Instructs autodetect that there are no headers and data should be read starting from the first row. - 0
: If using engine="bigquery"
, Autodetect tries to detect headers in the first row. If they are not detected, the row is read as data. Otherwise data is read starting from the second row. When using default engine, pandas assumes the first row contains column names unless the names
argument is specified. If names
is provided, then the first row is ignored, second row is read as data, and column names are inferred from names
. - N > 0
: If using engine="bigquery"
, Autodetect skips N rows and tries to detect headers in row N+1. If headers are not detected, row N+1 is just skipped. Otherwise row N+1 is used to extract column names for the detected schema. When using default engine, pandas will skip N rows and assumes row N+1 contains column names unless the names
argument is specified. If names
is provided, row N+1 will be ignored, row N+2 will be read as data, and column names are inferred from names
.
names
default None
a list of column names to use. If the file contains a header row and you want to pass this parameter, then header=0
should be passed as well so the first (header) row is ignored. Only to be used with default engine.
index_col
default None
column(s) to use as the row labels of the DataFrame, either given as string name or column index. index_col=False
can be used with the default engine only to enforce that the first column is not used as the index. Using column index instead of column name is only supported with the default engine. The BigQuery engine only supports having a single column name as the index_col
. Neither engine supports having a multi-column index.
usecols
default None
List of column names to use): The BigQuery engine only supports having a list of string column names. Column indices and callable functions are only supported with the default engine. Using the default engine, the column names in usecols
can be defined to correspond to column names provided with the names
parameter (ignoring the document's header row of column names). The order of the column indices/names in usecols
is ignored with the default engine. The order of the column names provided with the BigQuery engine will be consistent in the resulting dataframe. If using a callable function with the default engine, only column names that evaluate to True by the callable function will be in the resulting dataframe.
dtype
data type for data or columns
Data type for data or columns. Only to be used with default engine.
engine
Optional[Dict], default None
Type of engine to use. If engine="bigquery"
is specified, then BigQuery's load API will be used. Otherwise, the engine will be passed to pandas.read_csv
.
encoding
Optional[str], default to None
encoding the character encoding of the data. The default encoding is UTF-8
for both engines. The default engine acceps a wide range of encodings. Refer to Python documentation for a comprehensive list, https://docs.python.org/3/library/codecs.html#standard-encodings
The BigQuery engine only supports UTF-8
and ISO-8859-1
.
read_gbq
read_gbq
(
query_or_table
:
str
,
*
,
index_col
:
typing
.
Union
[
typing
.
Iterable
[
str
],
str
]
=
(),
columns
:
typing
.
Iterable
[
str
]
=
(),
max_results
:
typing
.
Optional
[
int
]
=
None
,
filters
:
typing
.
Iterable
[
typing
.
Union
[
typing
.
Tuple
[
str
,
typing
.
Literal
[
"in"
,
"not in"
,
"<"
,
"<="
,
"=="
,
"!="
,
">="
,
">"
],
typing
.
Any
,
],
typing
.
Iterable
[
typing
.
Tuple
[
str
,
typing
.
Literal
[
"in"
,
"not in"
,
"<"
,
"<="
,
"=="
,
"!="
,
">="
,
">"
],
typing
.
Any
,
]
],
]
]
=
(),
use_cache
:
bool
=
True
,
col_order
:
typing
.
Iterable
[
str
]
=
()
)
-
> bigframes
.
dataframe
.
DataFrame
Loads a DataFrame from BigQuery.
BigQuery tables are an unordered, unindexed data source. By default, the DataFrame will have an arbitrary index and ordering.
Set the index_col
argument to one or more columns to choose an
index. The resulting DataFrame is sorted by the index columns. For the
best performance, ensure the index columns don't contain duplicate
values.
GENERATE_UUID() AS
rowindex
in your SQL and set index_col='rowindex'
for the
best performance. Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
If the input is a table ID:
>>> df = bpd.read_gbq("bigquery-public-data.ml_datasets.penguins")
Preserve ordering in a query input.
>>> df = bpd.read_gbq('''
... SELECT
... -- Instead of an ORDER BY clause on the query, use
... -- ROW_NUMBER() to create an ordered DataFrame.
... ROW_NUMBER() OVER (ORDER BY AVG(pitchSpeed) DESC)
... AS rowindex,
...
... pitcherFirstName,
... pitcherLastName,
... AVG(pitchSpeed) AS averagePitchSpeed
... FROM `bigquery-public-data.baseball.games_wide`
... WHERE year = 2016
... GROUP BY pitcherFirstName, pitcherLastName
... ''', index_col="rowindex")
>>> df.head(2)
pitcherFirstName pitcherLastName averagePitchSpeed
rowindex
1 Albertin Chapman 96.514113
2 Zachary Britton 94.591039
<BLANKLINE>
[2 rows x 3 columns]
Reading data with columns
and filters
parameters:
>>> columns = ['pitcherFirstName', 'pitcherLastName', 'year', 'pitchSpeed']
>>> filters = [('year', '==', 2016), ('pitcherFirstName', 'in', ['John', 'Doe']), ('pitcherLastName', 'in', ['Gant'])]
>>> df = bpd.read_gbq(
... "bigquery-public-data.baseball.games_wide",
... columns=columns,
... filters=filters,
... )
>>> df.head(1)
pitcherFirstName pitcherLastName year pitchSpeed
0 John Gant 2016 82
<BLANKLINE>
[1 rows x 4 columns]
query_or_table
str
A SQL string to be executed or a BigQuery table to be read. The table must be specified in the format of project.dataset.tablename
or dataset.tablename
.
index_col
Iterable[str] or str
Name of result column(s) to use for index in results DataFrame.
columns
Iterable[str]
List of BigQuery column names in the desired order for results DataFrame.
max_results
Optional[int], default None
If set, limit the maximum number of rows to fetch from the query results.
filters
Iterable[Union[Tuple, Iterable[Tuple]]], default ()
To filter out data. Filter syntax: [[(column, op, val), …],…] where op is [==, >, >=, <, <=, !=, in, not in]. The innermost tuples are transposed into a set of filters applied through an AND operation. The outer Iterable combines these sets of filters through an OR operation. A single Iterable of tuples can also be used, meaning that no OR operation between set of filters is to be conducted.
use_cache
bool, default True
Whether to cache the query inputs. Default to True.
col_order
Iterable[str]
Alias for columns, retained for backwards compatibility.
read_gbq_function
read_gbq_function
(
function_name
:
str
)
Loads a BigQuery function from BigQuery.
Then it can be applied to a DataFrame or Series.
BigQuery Utils provides many public functions under thebqutil
project on Google Cloud Platform project
(See: https://github.com/GoogleCloudPlatform/bigquery-utils/tree/master/udfs#using-the-udfs
).
You can checkout Community UDFs to use community-contributed functions.
(See: https://github.com/GoogleCloudPlatform/bigquery-utils/tree/master/udfs/community#community-udfs
). Examples:
Use the cw_lower_case_ascii_only
function from Community UDFs.
( https://github.com/GoogleCloudPlatform/bigquery-utils/blob/master/udfs/community/cw_lower_case_ascii_only.sqlx
)
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> df = bpd.DataFrame({'id': [1, 2, 3], 'name': ['AURÉLIE', 'CÉLESTINE', 'DAPHNÉ']})
>>> df
id name
0 1 AURÉLIE
1 2 CÉLESTINE
2 3 DAPHNÉ
<BLANKLINE>
[3 rows x 2 columns]
>>> func = bpd.read_gbq_function("bqutil.fn.cw_lower_case_ascii_only")
>>> df1 = df.assign(new_name=df['name'].apply(func))
>>> df1
id name new_name
0 1 AURÉLIE aurÉlie
1 2 CÉLESTINE cÉlestine
2 3 DAPHNÉ daphnÉ
<BLANKLINE>
[3 rows x 3 columns]
function_name
str
the function's name in BigQuery in the format project_id.dataset_id.function_name
, or dataset_id.function_name
to load from the default project, or function_name
to load from the default project and the dataset associated with the current session.
callable
remote_function
decorator, including the bigframes_remote_function
property, but not including the bigframes_cloud_function
property.read_gbq_model
read_gbq_model
(
model_name
:
str
)
Loads a BigQuery ML model from BigQuery.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Read an existing BigQuery ML model.
>>> model_name = "bigframes-dev.bqml_tutorial.penguins_model"
>>> model = bpd.read_gbq_model(model_name)
model_name
str
the model's name in BigQuery in the format project_id.dataset_id.model_id
, or just dataset_id.model_id
to load from the default project.
read_gbq_query
read_gbq_query
(
query
:
str
,
*
,
index_col
:
typing
.
Union
[
typing
.
Iterable
[
str
],
str
]
=
(),
columns
:
typing
.
Iterable
[
str
]
=
(),
max_results
:
typing
.
Optional
[
int
]
=
None
,
use_cache
:
bool
=
True
,
col_order
:
typing
.
Iterable
[
str
]
=
()
)
-
> bigframes
.
dataframe
.
DataFrame
Turn a SQL query into a DataFrame.
Note: Because the results are written to a temporary table, ordering by ORDER BY
is not preserved. A unique index_col
is recommended. Use row_number() over ()
if there is no natural unique index or you
want to preserve ordering.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Simple query input:
>>> df = bpd.read_gbq_query('''
... SELECT
... pitcherFirstName,
... pitcherLastName,
... pitchSpeed,
... FROM `bigquery-public-data.baseball.games_wide`
... ''')
Preserve ordering in a query input.
>>> df = bpd.read_gbq_query('''
... SELECT
... -- Instead of an ORDER BY clause on the query, use
... -- ROW_NUMBER() to create an ordered DataFrame.
... ROW_NUMBER() OVER (ORDER BY AVG(pitchSpeed) DESC)
... AS rowindex,
...
... pitcherFirstName,
... pitcherLastName,
... AVG(pitchSpeed) AS averagePitchSpeed
... FROM `bigquery-public-data.baseball.games_wide`
... WHERE year = 2016
... GROUP BY pitcherFirstName, pitcherLastName
... ''', index_col="rowindex")
>>> df.head(2)
pitcherFirstName pitcherLastName averagePitchSpeed
rowindex
1 Albertin Chapman 96.514113
2 Zachary Britton 94.591039
<BLANKLINE>
[2 rows x 3 columns]
See also: Session.read_gbq
.
read_gbq_table
read_gbq_table
(
query
:
str
,
*
,
index_col
:
typing
.
Union
[
typing
.
Iterable
[
str
],
str
]
=
(),
columns
:
typing
.
Iterable
[
str
]
=
(),
max_results
:
typing
.
Optional
[
int
]
=
None
,
use_cache
:
bool
=
True
,
col_order
:
typing
.
Iterable
[
str
]
=
()
)
-
> bigframes
.
dataframe
.
DataFrame
Turn a BigQuery table into a DataFrame.
Examples:
>>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
Read a whole table, with arbitrary ordering or ordering corresponding to the primary key(s).
>>> df = bpd.read_gbq_table("bigquery-public-data.ml_datasets.penguins")
See also: Session.read_gbq
.
read_json
read_json
(
path_or_buf
:
typing
.
Union
[
str
,
typing
.
IO
[
bytes
]],
*
,
orient
:
typing
.
Literal
[
"split"
,
"records"
,
"index"
,
"columns"
,
"values"
,
"table"
]
=
"columns"
,
dtype
:
typing
.
Optional
[
typing
.
Dict
]
=
None
,
encoding
:
typing
.
Optional
[
str
]
=
None
,
lines
:
bool
=
False
,
engine
:
typing
.
Literal
[
"ujson"
,
"pyarrow"
,
"bigquery"
]
=
"ujson"
,
**
kwargs
)
-
> bigframes
.
dataframe
.
DataFrame
Convert a JSON string to DataFrame object.
Examples: >>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> gcs_path = "gs://bigframes-dev-testing/sample1.json"
>>> df = bpd.read_json(path_or_buf=gcs_path, lines=True, orient="records")
>>> df.head(2)
id name
0 1 Alice
1 2 Bob
<BLANKLINE>
[2 rows x 2 columns]
path_or_buf
a valid JSON str, path object or file-like object
A local or Google Cloud Storage ( gs://
) path with engine="bigquery"
otherwise passed to pandas.read_json.
orient
str, optional
If engine="bigquery"
orient only supports "records". Indication of expected JSON string format. Compatible JSON strings can be produced by to_json()
with a corresponding orient value. The set of possible orients is: - 'split'
: dict like {{index -> [index], columns -> [columns], data -> [values]}}
- 'records'
: list like [{{column -> value}}, ... , {{column -> value}}]
- 'index'
: dict like {{index -> {{column -> value}}}}
- 'columns'
: dict like {{column -> {{index -> value}}}}
- 'values'
: just the values array
dtype
bool or dict, default None
If True, infer dtypes; if a dict of column to dtype, then use those; if False, then don't infer dtypes at all, applies only to the data. For all orient
values except 'table'
, default is True.
encoding
str, default is 'utf-8'
The encoding to use to decode py3 bytes.
lines
bool, default False
Read the file as a json object per line. If using engine="bigquery"
lines only supports True.
engine
{{"ujson", "pyarrow", "bigquery"}}, default "ujson"
Type of engine to use. If engine="bigquery"
is specified, then BigQuery's load API will be used. Otherwise, the engine will be passed to pandas.read_json
.
read_pandas
read_pandas
(
pandas_dataframe
:
pandas
.
core
.
frame
.
DataFrame
,
)
-
> bigframes
.
dataframe
.
DataFrame
Loads DataFrame from a pandas DataFrame.
The pandas DataFrame will be persisted as a temporary BigQuery table, which can be automatically recycled after the Session is closed.
Examples:
>>> import bigframes.pandas as bpd
>>> import pandas as pd
>>> bpd.options.display.progress_bar = None
>>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> pandas_df = pd.DataFrame(data=d)
>>> df = bpd.read_pandas(pandas_df)
>>> df
col1 col2
0 1 3
1 2 4
<BLANKLINE>
[2 rows x 2 columns]
pandas_dataframe
pandas.DataFrame
a pandas DataFrame object to be loaded.
read_parquet
read_parquet
(
path
:
typing
.
Union
[
str
,
typing
.
IO
[
bytes
]]
)
-
> bigframes
.
dataframe
.
DataFrame
Load a Parquet object from the file path (local or Cloud Storage), returning a DataFrame.
Examples: >>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> gcs_path = "gs://cloud-samples-data/bigquery/us-states/us-states.parquet"
>>> df = bpd.read_parquet(path=gcs_path)
path
str
Local or Cloud Storage path to Parquet file.
read_pickle
read_pickle
(
filepath_or_buffer
:
FilePath
|
ReadPickleBuffer
,
compression
:
CompressionOptions
=
"infer"
,
storage_options
:
StorageOptions
=
None
,
)
Load pickled BigFrames object (or any object) from file.
Examples: >>> import bigframes.pandas as bpd
>>> bpd.options.display.progress_bar = None
>>> gcs_path = "gs://bigframes-dev-testing/test_pickle.pkl"
>>> df = bpd.read_pickle(filepath_or_buffer=gcs_path)
filepath_or_buffer
str, path object, or file-like object
String, path object (implementing os.PathLike[str]), or file-like object implementing a binary readlines() function. Also accepts URL. URL is not limited to S3 and GCS.
compression
str or dict, default 'infer'
For on-the-fly decompression of on-disk data. If 'infer' and 'filepath_or_buffer' is path-like, then detect compression from the following extensions: '.gz', '.bz2', '.zip', '.xz', '.zst', '.tar', '.tar.gz', '.tar.xz' or '.tar.bz2' (otherwise no compression). If using 'zip' or 'tar', the ZIP file must contain only one data file to be read in. Set to None for no decompression. Can also be a dict with key 'method' set to one of {'zip', 'gzip', 'bz2', 'zstd', 'tar'} and other key-value pairs are forwarded to zipfile.ZipFile, gzip.GzipFile, bz2.BZ2File, zstandard.ZstdDecompressor or tarfile.TarFile, respectively. As an example, the following could be passed for Zstandard decompression using a custom compression dictionary compression={'method': 'zstd', 'dict_data': my_compression_dict}.
storage_options
dict, default None
Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to urllib.request.Request as header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded to fsspec.open. Please see fsspec and urllib for more details, and for more examples on storage options refer here.
remote_function
remote_function
(
input_types
:
typing
.
List
[
type
],
output_type
:
type
,
dataset
:
typing
.
Optional
[
str
]
=
None
,
bigquery_connection
:
typing
.
Optional
[
str
]
=
None
,
reuse
:
bool
=
True
,
name
:
typing
.
Optional
[
str
]
=
None
,
packages
:
typing
.
Optional
[
typing
.
Sequence
[
str
]]
=
None
,
)
Decorator to turn a user defined function into a BigQuery remote function. Check out the code samples at: https://cloud.google.com/bigquery/docs/remote-functions#bigquery-dataframes .
-
Have the below APIs enabled for your project:
- BigQuery Connection API
- Cloud Functions API
- Cloud Run API
- Cloud Build API
- Artifact Registry API
- Cloud Resource Manager API
This can be done from the cloud console (change
PROJECT_ID
to yours): https://console.cloud.google.com/apis/enableflow?apiid=bigqueryconnection.googleapis.com,cloudfunctions.googleapis.com,run.googleapis.com,cloudbuild.googleapis.com,artifactregistry.googleapis.com,cloudresourcemanager.googleapis.com&project=PROJECT_IDOr from the gcloud CLI:
$ gcloud services enable bigqueryconnection.googleapis.com cloudfunctions.googleapis.com run.googleapis.com cloudbuild.googleapis.com artifactregistry.googleapis.com cloudresourcemanager.googleapis.com
-
Have following IAM roles enabled for you:
- BigQuery Data Editor (roles/bigquery.dataEditor)
- BigQuery Connection Admin (roles/bigquery.connectionAdmin)
- Cloud Functions Developer (roles/cloudfunctions.developer)
- Service Account User (roles/iam.serviceAccountUser) on the service account
PROJECT_NUMBER-compute@developer.gserviceaccount.com
- Storage Object Viewer (roles/storage.objectViewer)
- Project IAM Admin (roles/resourcemanager.projectIamAdmin) (Only required if the bigquery connection being used is not pre-created and is created dynamically with user credentials.)
-
Either the user has setIamPolicy privilege on the project, or a BigQuery connection is pre-created with necessary IAM role set:
- To create a connection, follow https://cloud.google.com/bigquery/docs/reference/standard-sql/remote-functions#create_a_connection
-
To set up IAM, follow https://cloud.google.com/bigquery/docs/reference/standard-sql/remote-functions#grant_permission_on_function
Alternatively, the IAM could also be setup via the gcloud CLI:
$ gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:CONNECTION_SERVICE_ACCOUNT_ID" --role="roles/run.invoker"
.
input_types
list(type)
List of input data types in the user defined function.
output_type
type
Data type of the output in the user defined function.
dataset
str, Optional
Dataset in which to create a BigQuery remote function. It should be in <project_id>.<dataset_name>
or <dataset_name>
format. If this parameter is not provided then session dataset id is used.
bigquery_connection
str, Optional
Name of the BigQuery connection. You should either have the connection already created in the location
you have chosen, or you should have the Project IAM Admin role to enable the service to create the connection for you if you need it. If this parameter is not provided then the BigQuery connection from the session is used.
reuse
bool, Optional
Reuse the remote function if already exists. True
by default, which will result in reusing an existing remote function and corresponding cloud function (if any) that was previously created for the same udf. Setting it to False
would force creating a unique remote function. If the required remote function does not exist then it would be created irrespective of this param.
name
str, Optional
Explicit name of the persisted BigQuery remote function. Use it with caution, because two users working in the same project and dataset could overwrite each other's remote functions if they use the same persistent name.
packages
str[], Optional
Explicit name of the external package dependencies. Each dependency is added to the requirements.txt
as is, and can be of the form supported in https://pip.pypa.io/en/stable/reference/requirements-file-format/
.
callable
bigframes_cloud_function
- The google cloud function deployed for the user defined code. bigframes_remote_function
- The bigquery remote function capable of calling into bigframes_cloud_function
.