- 2.29.0 (latest)
- 2.28.0
- 2.27.0
- 2.26.0
- 2.25.0
- 2.24.0
- 2.23.0
- 2.22.0
- 2.21.0
- 2.20.0
- 2.19.0
- 2.18.0
- 2.17.0
- 2.16.0
- 2.15.0
- 2.14.0
- 2.13.0
- 2.12.0
- 2.11.0
- 2.10.0
- 2.9.0
- 2.8.0
- 2.7.0
- 2.6.0
- 2.5.0
- 2.4.0
- 2.3.0
- 2.2.0
- 1.36.0
- 1.35.0
- 1.34.0
- 1.33.0
- 1.32.0
- 1.31.0
- 1.30.0
- 1.29.0
- 1.28.0
- 1.27.0
- 1.26.0
- 1.25.0
- 1.24.0
- 1.22.0
- 1.21.0
- 1.20.0
- 1.19.0
- 1.18.0
- 1.17.0
- 1.16.0
- 1.15.0
- 1.14.0
- 1.13.0
- 1.12.0
- 1.11.1
- 1.10.0
- 1.9.0
- 1.8.0
- 1.7.0
- 1.6.0
- 1.5.0
- 1.4.0
- 1.3.0
- 1.2.0
- 1.1.0
- 1.0.0
- 0.26.0
- 0.25.0
- 0.24.0
- 0.23.0
- 0.22.0
- 0.21.0
- 0.20.1
- 0.19.2
- 0.18.0
- 0.17.0
- 0.16.0
- 0.15.0
- 0.14.1
- 0.13.0
- 0.12.0
- 0.11.0
- 0.10.0
- 0.9.0
- 0.8.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
Session
(
context
:
typing
.
Optional
[
bigframes
.
_config
.
bigquery_options
.
BigQueryOptions
]
=
None
,
clients_provider
:
typing
.
Optional
[
bigframes
.
session
.
clients
.
ClientsProvider
]
=
None
,
)
Establishes a BigQuery connection to capture a group of job activities related to DataFrames.
Properties
MultiIndex
Constructs a MultiIndex.
Included for compatibility between bpd and Session.
See <xref uid="bigframes.pandas">bigframes.pandas</xref>.MulitIndex
for full documentation.
bqclient
API documentation for bqclient
property.
bqconnectionclient
API documentation for bqconnectionclient
property.
bqconnectionmanager
API documentation for bqconnectionmanager
property.
bqstoragereadclient
API documentation for bqstoragereadclient
property.
bytes_processed_sum
The sum of all bytes processed by bigquery jobs using this session.
cloudfunctionsclient
API documentation for cloudfunctionsclient
property.
objects
API documentation for objects
property.
options
Options for configuring BigQuery DataFrames.
Included for compatibility between bpd and Session.
resourcemanagerclient
API documentation for resourcemanagerclient
property.
session_id
API documentation for session_id
property.
slot_millis_sum
The sum of all slot time used by bigquery jobs in this session.
Methods
DataFrame
DataFrame
(
*
args
,
**
kwargs
)
Constructs a DataFrame.
Included for compatibility between bpd and Session.
See <xref uid="bigframes.pandas.DataFrame">bigframes.pandas.DataFrame</xref>
for full documentation.
Index
Index
(
*
args
,
**
kwargs
)
Constructs a Index.
Included for compatibility between bpd and Session.
See <xref uid="bigframes.pandas.Index">bigframes.pandas.Index</xref>
for full documentation.
Series
Series
(
*
args
,
**
kwargs
)
Constructs a Series.
Included for compatibility between bpd and Session.
See <xref uid="bigframes.pandas.Series">bigframes.pandas.Series</xref>
for full documentation.
__del__
__del__
()
Automatic cleanup of internal resources.
__enter__
__enter__
()
Enter the runtime context of the Session object.
See With Statement Context Managers for more details.
__exit__
__exit__
(
*
_
)
Exit the runtime context of the Session object.
See With Statement Context Managers for more details.
close
close
()
Delete resources that were created with this session's session_id. This includes BigQuery tables, remote functions and cloud functions serving the remote functions.
cut
cut
(
*
args
,
**
kwargs
)
-
> bigframes
.
series
.
Series
Cuts a BigQuery DataFrames object.
Included for compatibility between bpd and Session.
See <xref uid="bigframes.pandas.cut">bigframes.pandas.cut</xref>
for full documentation.
deploy_remote_function
deploy_remote_function
(
func
,
**
kwargs
)
Orchestrates the creation of a BigQuery remote function that deploys immediately.
This method ensures that the remote function is created and available for use in BigQuery as soon as this call is made.
deploy_udf
deploy_udf
(
func
,
**
kwargs
)
Orchestrates the creation of a BigQuery UDF that deploys immediately.
This method ensures that the UDF is created and available for use in BigQuery as soon as this call is made.
from_glob_path
from_glob_path
(
path
:
str
,
*
,
connection
:
Optional
[
str
]
=
None
,
name
:
Optional
[
str
]
=
None
)
-
> dataframe
.
DataFrame
Create a BigFrames DataFrame that contains a BigFrames Blob column from a global wildcard path. This operation creates a temporary BQ Object Table under the hood and requires bigquery.connections.delegate permission or BigQuery Connection Admin role. If you have an existing BQ Object Table, use read_gbq_object_table().
read_arrow
read_arrow
(
pa_table
:
pyarrow
.
lib
.
Table
)
-
> bigframes
.
dataframe
.
DataFrame
Load a PyArrow Table to a BigQuery DataFrames DataFrame.
read_csv
read_csv
(
filepath_or_buffer
:
str
|
IO
[
"bytes"
],
*
,
sep
:
Optional
[
str
]
=
","
,
header
:
Optional
[
int
]
=
0
,
names
:
Optional
[
Union
[
MutableSequence
[
Any
],
np
.
ndarray
[
Any
,
Any
],
Tuple
[
Any
,
...
],
range
]
]
=
None
,
index_col
:
Optional
[
Union
[
int
,
str
,
Sequence
[
Union
[
str
,
int
]],
bigframes
.
enums
.
DefaultIndexKind
,
Literal
[
False
],
]
]
=
None
,
usecols
:
Optional
[
Union
[
MutableSequence
[
str
],
Tuple
[
str
,
...
],
Sequence
[
int
],
pandas
.
Series
,
pandas
.
Index
,
np
.
ndarray
[
Any
,
Any
],
Callable
[[
Any
],
bool
],
]
]
=
None
,
dtype
:
Optional
[
Dict
]
=
None
,
engine
:
Optional
[
Literal
[
"c"
,
"python"
,
"pyarrow"
,
"python-fwf"
,
"bigquery"
]
]
=
None
,
encoding
:
Optional
[
str
]
=
None
,
write_engine
:
constants
.
WriteEngineType
=
"default"
,
**
kwargs
)
-
> dataframe
.
DataFrame
Loads data from a comma-separated values (csv) file into a DataFrame.
The CSV file data will be persisted as a temporary BigQuery table, which can be automatically recycled after the Session is closed.
Examples: >>> import bigframes.pandas as bpd
>>> gcs_path = "gs://cloud-samples-data/bigquery/us-states/us-states.csv"
>>> df = bpd.read_csv(filepath_or_buffer=gcs_path)
>>> df.head(2)
name post_abbr
0 Alabama AL
1 Alaska AK
<BLANKLINE>
[2 rows x 2 columns]
read_gbq
Loads a DataFrame from BigQuery.
BigQuery tables are an unordered, unindexed data source. To add support
pandas-compatibility, the following indexing options are supported via
the index_col
parameter:
-
(Empty iterable, default) A default index. Behavior may change.Explicitly set
index_colif your application makes use of specific index values.If a table has primary key(s), those are used as the index, otherwise a sequential index is generated.
- (
<xref uid="bigframes.enums.DefaultIndexKind.SEQUENTIAL_INT64">bigframes.enums.DefaultIndexKind.SEQUENTIAL_INT64</xref>) Add an arbitrary sequential index and ordering. WarningThis uses an analytic windowed operation that prevents filtering push down. Avoid using on large clustered or partitioned tables. - (Recommended) Set the
index_colargument to one or more columns. Unique values for the row labels are recommended. Duplicate labels are possible, but note that joins on a non-unique index can duplicate rows via pandas-compatible outer join behavior.
GENERATE_UUID() AS
rowindex
in your SQL and set index_col='rowindex'
for the
best performance. Examples:
>>> import bigframes.pandas as bpd
If the input is a table ID:
>>> df = bpd.read_gbq("bigquery-public-data.ml_datasets.penguins")
Read table path with wildcard suffix and filters:
>>> df = bpd.read_gbq_table("bigquery-public-data.noaa_gsod.gsod19*", filters=[("_table_suffix", ">=", "30"), ("_table_suffix", "<=", "39")])
Preserve ordering in a query input.
>>> df = bpd.read_gbq('''
... SELECT
... -- Instead of an ORDER BY clause on the query, use
... -- ROW_NUMBER() to create an ordered DataFrame.
... ROW_NUMBER() OVER (ORDER BY AVG(pitchSpeed) DESC)
... AS rowindex,
...
... pitcherFirstName,
... pitcherLastName,
... AVG(pitchSpeed) AS averagePitchSpeed
... FROM `bigquery-public-data.baseball.games_wide`
... WHERE year = 2016
... GROUP BY pitcherFirstName, pitcherLastName
... ''', index_col="rowindex")
>>> df.head(2)
pitcherFirstName pitcherLastName averagePitchSpeed
rowindex
1 Albertin Chapman 96.514113
2 Zachary Britton 94.591039
<BLANKLINE>
[2 rows x 3 columns]
Reading data with columns
and filters
parameters:
>>> columns = ['pitcherFirstName', 'pitcherLastName', 'year', 'pitchSpeed']
>>> filters = [('year', '==', 2016), ('pitcherFirstName', 'in', ['John', 'Doe']), ('pitcherLastName', 'in', ['Gant']), ('pitchSpeed', '>', 94)]
>>> df = bpd.read_gbq(
... "bigquery-public-data.baseball.games_wide",
... columns=columns,
... filters=filters,
... )
>>> df.head(1)
pitcherFirstName pitcherLastName year pitchSpeed
0 John Gant 2016 95
<BLANKLINE>
[1 rows x 4 columns]
ValueError
columns
and col_order
are specified.ValueError
configuration
is specified when directly reading from a table.read_gbq_function
read_gbq_function
(
function_name
:
str
,
is_row_processor
:
bool
=
False
)
Loads a BigQuery function from BigQuery.
Then it can be applied to a DataFrame or Series.
BigQuery Utils provides many public functions under thebqutil
project on Google Cloud Platform project
(See: https://github.com/GoogleCloudPlatform/bigquery-utils/tree/master/udfs#using-the-udfs
).
You can checkout Community UDFs to use community-contributed functions.
(See: https://github.com/GoogleCloudPlatform/bigquery-utils/tree/master/udfs/community#community-udfs
). Examples:
Use the cw_lower_case_ascii_only function from Community UDFs.
>>> import bigframes.pandas as bpd
>>> func = bpd.read_gbq_function("bqutil.fn.cw_lower_case_ascii_only")
You can run it on scalar input. Usually you would do so to verify that it works as expected before applying to all values in a Series.
>>> func('AURÉLIE')
'aurÉlie'
You can apply it to a BigQuery DataFrames Series.
>>> df = bpd.DataFrame({'id': [1, 2, 3], 'name': ['AURÉLIE', 'CÉLESTINE', 'DAPHNÉ']})
>>> df
id name
0 1 AURÉLIE
1 2 CÉLESTINE
2 3 DAPHNÉ
<BLANKLINE>
[3 rows x 2 columns]
>>> df1 = df.assign(new_name=df['name'].apply(func))
>>> df1
id name new_name
0 1 AURÉLIE aurÉlie
1 2 CÉLESTINE cÉlestine
2 3 DAPHNÉ daphnÉ
<BLANKLINE>
[3 rows x 3 columns]
You can even use a function with multiple inputs. For example, cw_regexp_replace_5 from Community UDFs.
>>> func = bpd.read_gbq_function("bqutil.fn.cw_regexp_replace_5")
>>> func('TestStr123456', 'Str', 'Cad$', 1, 1)
'TestCad$123456'
>>> df = bpd.DataFrame({
... "haystack" : ["TestStr123456", "TestStr123456Str", "TestStr123456Str"],
... "regexp" : ["Str", "Str", "Str"],
... "replacement" : ["Cad$", "Cad$", "Cad$"],
... "offset" : [1, 1, 1],
... "occurrence" : [1, 2, 1]
... })
>>> df
haystack regexp replacement offset occurrence
0 TestStr123456 Str Cad$ 1 1
1 TestStr123456Str Str Cad$ 1 2
2 TestStr123456Str Str Cad$ 1 1
<BLANKLINE>
[3 rows x 5 columns]
>>> df.apply(func, axis=1)
0 TestCad$123456
1 TestStr123456Cad$
2 TestCad$123456Str
dtype: string
Another use case is to define your own remote function and use it later. For example, define the remote function:
>>> @bpd.remote_function(cloud_function_service_account="default") # doctest: +SKIP
... def tenfold(num: int) -> float:
... return num * 10
Then, read back the deployed BQ remote function:
>>> tenfold_ref = bpd.read_gbq_function( # doctest: +SKIP
... tenfold.bigframes_remote_function,
... )
>>> df = bpd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
>>> df
a b c
0 1 3 5
1 2 4 6
<BLANKLINE>
[2 rows x 3 columns]
>>> df['a'].apply(tenfold_ref) # doctest: +SKIP
0 10.0
1 20.0
Name: a, dtype: Float64
It also supports row processing by using is_row_processor=True
. Please
note, row processor implies that the function has only one input
parameter.
>>> @bpd.remote_function(cloud_function_service_account="default") # doctest: +SKIP
... def row_sum(s: pd.Series) -> float:
... return s['a'] + s['b'] + s['c']
>>> row_sum_ref = bpd.read_gbq_function( # doctest: +SKIP
... row_sum.bigframes_remote_function,
... is_row_processor=True,
... )
>>> df = bpd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
>>> df
a b c
0 1 3 5
1 2 4 6
<BLANKLINE>
[2 rows x 3 columns]
>>> df.apply(row_sum_ref, axis=1) # doctest: +SKIP
0 9.0
1 12.0
dtype: Float64
collections.abc.Callable
remote_function
decorator, including the bigframes_remote_function
property, but not including the bigframes_cloud_function
property.read_gbq_model
read_gbq_model
(
model_name
:
str
)
Loads a BigQuery ML model from BigQuery.
Examples:
Read an existing BigQuery ML model.
>>> import bigframes.pandas as bpd
>>> model_name = "bigframes-dev.bqml_tutorial.penguins_model"
>>> model = bpd.read_gbq_model(model_name)
read_gbq_object_table
read_gbq_object_table
(
object_table
:
str
,
*
,
name
:
Optional
[
str
]
=
None
)
-
> dataframe
.
DataFrame
Read an existing object table to create a BigFrames Blob DataFrame. Use the connection of the object table for the connection of the blob. This function dosen't retrieve the object table data. If you want to read the data, use read_gbq() instead.
read_gbq_query
Turn a SQL query into a DataFrame.
Note: Because the results are written to a temporary table, ordering by ORDER BY
is not preserved. A unique index_col
is recommended. Use row_number() over ()
if there is no natural unique index or you
want to preserve ordering.
Examples:
Simple query input:
>>> import bigframes.pandas as bpd
>>> df = bpd.read_gbq_query('''
... SELECT
... pitcherFirstName,
... pitcherLastName,
... pitchSpeed,
... FROM `bigquery-public-data.baseball.games_wide`
... ''')
Preserve ordering in a query input.
>>> df = bpd.read_gbq_query('''
... SELECT
... -- Instead of an ORDER BY clause on the query, use
... -- ROW_NUMBER() to create an ordered DataFrame.
... ROW_NUMBER() OVER (ORDER BY AVG(pitchSpeed) DESC)
... AS rowindex,
...
... pitcherFirstName,
... pitcherLastName,
... AVG(pitchSpeed) AS averagePitchSpeed
... FROM `bigquery-public-data.baseball.games_wide`
... WHERE year = 2016
... GROUP BY pitcherFirstName, pitcherLastName
... ''', index_col="rowindex")
>>> df.head(2)
pitcherFirstName pitcherLastName averagePitchSpeed
rowindex
1 Albertin Chapman 96.514113
2 Zachary Britton 94.591039
<BLANKLINE>
[2 rows x 3 columns]
See also: Session.read_gbq
.
ValueError
columns
and col_order
are specified. bigframes.pandas.DataFrame
or pandas.Series
dry_run
is True
, a pandas.Series
containing query statistics is returned.read_gbq_table
Turn a BigQuery table into a DataFrame.
Examples:
Read a whole table, with arbitrary ordering or ordering corresponding to the primary key(s).
>>> import bigframes.pandas as bpd
>>> df = bpd.read_gbq_table("bigquery-public-data.ml_datasets.penguins")
See also: Session.read_gbq
.
ValueError
columns
and col_order
are specified. bigframes.pandas.DataFrame
or pandas.Series
dry_run
is True
, a pandas.Series
containing table statistics is returned.read_gbq_table_streaming
read_gbq_table_streaming
(
table
:
str
)
-
> streaming_dataframe
.
StreamingDataFrame
Turn a BigQuery table into a StreamingDataFrame.
import bigframes.streaming as bst
sdf = bst.read_gbq_table("bigquery-public-data.ml_datasets.penguins")
read_json
read_json
(
path_or_buf
:
str
|
IO
[
"bytes"
],
*
,
orient
:
Literal
[
"split"
,
"records"
,
"index"
,
"columns"
,
"values"
,
"table"
]
=
"columns"
,
dtype
:
Optional
[
Dict
]
=
None
,
encoding
:
Optional
[
str
]
=
None
,
lines
:
bool
=
False
,
engine
:
Literal
[
"ujson"
,
"pyarrow"
,
"bigquery"
]
=
"ujson"
,
write_engine
:
constants
.
WriteEngineType
=
"default"
,
**
kwargs
)
-
> dataframe
.
DataFrame
Convert a JSON string to DataFrame object.
Examples: >>> import bigframes.pandas as bpd
>>> gcs_path = "gs://bigframes-dev-testing/sample1.json"
>>> df = bpd.read_json(path_or_buf=gcs_path, lines=True, orient="records")
>>> df.head(2)
id name
0 1 Alice
1 2 Bob
<BLANKLINE>
[2 rows x 2 columns]
ValueError
lines
is only valid when orient
is records
.read_pandas
Loads DataFrame from a pandas DataFrame.
The pandas DataFrame will be persisted as a temporary BigQuery table, which can be automatically recycled after the Session is closed.
Examples: >>> d = {'col1': [1, 2], 'col2': [3, 4]}
>>> pandas_df = pd.DataFrame(data=d)
>>> df = bpd.read_pandas(pandas_df)
>>> df
col1 col2
0 1 3
1 2 4
<BLANKLINE>
[2 rows x 2 columns]
ValueError
read_parquet
read_parquet
(
path
:
str
|
IO
[
"bytes"
],
*
,
engine
:
str
=
"auto"
,
write_engine
:
constants
.
WriteEngineType
=
"default"
)
-
> dataframe
.
DataFrame
Load a Parquet object from the file path (local or Cloud Storage), returning a DataFrame.
Examples: >>> import bigframes.pandas as bpd
>>> gcs_path = "gs://cloud-samples-data/bigquery/us-states/us-states.parquet"
>>> df = bpd.read_parquet(path=gcs_path, engine="bigquery")
read_pickle
read_pickle
(
filepath_or_buffer
:
FilePath
|
ReadPickleBuffer
,
compression
:
CompressionOptions
=
"infer"
,
storage_options
:
StorageOptions
=
None
,
*
,
write_engine
:
constants
.
WriteEngineType
=
"default"
)
Load pickled BigFrames object (or any object) from file.
Examples: >>> import bigframes.pandas as bpd
>>> gcs_path = "gs://bigframes-dev-testing/test_pickle.pkl"
>>> df = bpd.read_pickle(filepath_or_buffer=gcs_path)
remote_function
remote_function
(
input_types
:
typing
.
Union
[
None
,
type
,
typing
.
Sequence
[
type
]]
=
None
,
output_type
:
typing
.
Optional
[
type
]
=
None
,
dataset
:
typing
.
Optional
[
str
]
=
None
,
*
,
bigquery_connection
:
typing
.
Optional
[
str
]
=
None
,
reuse
:
bool
=
True
,
name
:
typing
.
Optional
[
str
]
=
None
,
packages
:
typing
.
Optional
[
typing
.
Sequence
[
str
]]
=
None
,
cloud_function_service_account
:
str
,
cloud_function_kms_key_name
:
typing
.
Optional
[
str
]
=
None
,
cloud_function_docker_repository
:
typing
.
Optional
[
str
]
=
None
,
max_batching_rows
:
typing
.
Optional
[
int
]
=
1000
,
cloud_function_timeout
:
typing
.
Optional
[
int
]
=
600
,
cloud_function_max_instances
:
typing
.
Optional
[
int
]
=
None
,
cloud_function_vpc_connector
:
typing
.
Optional
[
str
]
=
None
,
cloud_function_vpc_connector_egress_settings
:
typing
.
Optional
[
typing
.
Literal
[
"all"
,
"private-ranges-only"
,
"unspecified"
]
]
=
None
,
cloud_function_memory_mib
:
typing
.
Optional
[
int
]
=
1024
,
cloud_function_ingress_settings
:
typing
.
Literal
[
"all"
,
"internal-only"
,
"internal-and-gclb"
]
=
"internal-only"
,
cloud_build_service_account
:
typing
.
Optional
[
str
]
=
None
)
Decorator to turn a user defined function into a BigQuery remote function. Check out the code samples at: https://cloud.google.com/bigquery/docs/remote-functions#bigquery-dataframes .
See, https://cloud.google.com/functions/docs/securing/function-identity .-
Have the below APIs enabled for your project:
- BigQuery Connection API
- Cloud Functions API
- Cloud Run API
- Cloud Build API
- Artifact Registry API
- Cloud Resource Manager API
This can be done from the cloud console (change
PROJECT_IDto yours): https://console.cloud.google.com/apis/enableflow?apiid=bigqueryconnection.googleapis.com,cloudfunctions.googleapis.com,run.googleapis.com,cloudbuild.googleapis.com,artifactregistry.googleapis.com,cloudresourcemanager.googleapis.com&project=PROJECT_IDOr from the gcloud CLI:
$ gcloud services enable bigqueryconnection.googleapis.com cloudfunctions.googleapis.com run.googleapis.com cloudbuild.googleapis.com artifactregistry.googleapis.com cloudresourcemanager.googleapis.com -
Have following IAM roles enabled for you:
- BigQuery Data Editor (roles/bigquery.dataEditor)
- BigQuery Connection Admin (roles/bigquery.connectionAdmin)
- Cloud Functions Developer (roles/cloudfunctions.developer)
- Service Account User (roles/iam.serviceAccountUser) on the service account
PROJECT_NUMBER-compute@developer.gserviceaccount.com - Storage Object Viewer (roles/storage.objectViewer)
- Project IAM Admin (roles/resourcemanager.projectIamAdmin) (Only required if the bigquery connection being used is not pre-created and is created dynamically with user credentials.)
-
Either the user has setIamPolicy privilege on the project, or a BigQuery connection is pre-created with necessary IAM role set:
- To create a connection, follow https://cloud.google.com/bigquery/docs/reference/standard-sql/remote-functions#create_a_connection
-
To set up IAM, follow https://cloud.google.com/bigquery/docs/reference/standard-sql/remote-functions#grant_permission_on_function
Alternatively, the IAM could also be setup via the gcloud CLI:
$ gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:CONNECTION_SERVICE_ACCOUNT_ID" --role="roles/run.invoker".
collections.abc.Callable
bigframes_cloud_function
- The google cloud function deployed for the user defined code. bigframes_remote_function
- The bigquery remote function capable of calling into bigframes_cloud_function
.to_datetime
to_datetime
(
*
args
,
**
kwargs
)
-
> typing
.
Union
[
pandas
.
_libs
.
tslibs
.
timestamps
.
Timestamp
,
datetime
.
datetime
,
bigframes
.
series
.
Series
]
Converts a BigQuery DataFrames object to datetime dtype.
Included for compatibility between bpd and Session.
See <xref uid="bigframes.pandas.to_datetime">bigframes.pandas.to_datetime</xref>
for full documentation.
to_timedelta
to_timedelta
(
*
args
,
**
kwargs
)
Converts a BigQuery DataFrames object to timedelta/duration dtype.
Included for compatibility between bpd and Session.
See <xref uid="bigframes.pandas.to_timedelta">bigframes.pandas.to_timedelta</xref>
for full documentation.
udf
udf
(
*
,
input_types
:
typing
.
Union
[
None
,
type
,
typing
.
Sequence
[
type
]]
=
None
,
output_type
:
typing
.
Optional
[
type
]
=
None
,
dataset
:
str
,
bigquery_connection
:
typing
.
Optional
[
str
]
=
None
,
name
:
str
,
packages
:
typing
.
Optional
[
typing
.
Sequence
[
str
]]
=
None
,
max_batching_rows
:
typing
.
Optional
[
int
]
=
None
,
container_cpu
:
typing
.
Optional
[
float
]
=
None
,
container_memory
:
typing
.
Optional
[
str
]
=
None
)
Decorator to turn a Python user defined function (udf) into a BigQuery managed user-defined function .
Examples: >>> import datetime
Turning an arbitrary python function into a BigQuery managed python udf:
>>> bq_name = datetime.datetime.now().strftime("bigframes_%Y%m%d%H%M%S%f")
>>> @bpd.udf(dataset="bigfranes_testing", name=bq_name) # doctest: +SKIP
... def minutes_to_hours(x: int) -> float:
... return x/60
>>> minutes = bpd.Series([0, 30, 60, 90, 120])
>>> minutes
0 0
1 30
2 60
3 90
4 120
dtype: Int64
>>> hours = minutes.apply(minutes_to_hours) # doctest: +SKIP
>>> hours # doctest: +SKIP
0 0.0
1 0.5
2 1.0
3 1.5
4 2.0
dtype: Float64
To turn a user defined function with external package dependencies into
a BigQuery managed python udf, you would provide the names of the
packages (optionally with the package version) via packages
param.
>>> bq_name = datetime.datetime.now().strftime("bigframes_%Y%m%d%H%M%S%f")
>>> @bpd.udf( # doctest: +SKIP
... dataset="bigfranes_testing",
... name=bq_name,
... packages=["cryptography"]
... )
... def get_hash(input: str) -> str:
... from cryptography.fernet import Fernet
...
... # handle missing value
... if input is None:
... input = ""
...
... key = Fernet.generate_key()
... f = Fernet(key)
... return f.encrypt(input.encode()).decode()
>>> names = bpd.Series(["Alice", "Bob"])
>>> hashes = names.apply(get_hash) # doctest: +SKIP
You can clean-up the BigQuery functions created above using the BigQuery client from the BigQuery DataFrames session:
>>> session = bpd.get_global_session() # doctest: +SKIP
>>> session.bqclient.delete_routine(minutes_to_hours.bigframes_bigquery_function) # doctest: +SKIP
>>> session.bqclient.delete_routine(get_hash.bigframes_bigquery_function) # doctest: +SKIP
collections.abc.Callable
bigframes_bigquery_function
- The bigquery managed function deployed for the user defined code.
