Deploy and apply a remote function using BigQuery DataFrames

Use the BigQuery DataFrames API to deploy a Python function as a Cloud Function and use it as a remote function.

Code sample

Python

Before trying this sample, follow the Python setup instructions in the BigQuery quickstart using client libraries . For more information, see the BigQuery Python API reference documentation .

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries .

  import 
  
 bigframes.pandas 
  
 as 
  
 bpd 
 # Set BigQuery DataFrames options 
 bpd 
 . 
 options 
 . 
 bigquery 
 . 
 project 
 = 
 your_gcp_project_id 
 bpd 
 . 
 options 
 . 
 bigquery 
 . 
 location 
 = 
 "US" 
 # BigQuery DataFrames gives you the ability to turn your custom scalar 
 # functions into a BigQuery remote function. It requires the GCP project to 
 # be set up appropriately and the user having sufficient privileges to use 
 # them. One can find more details about the usage and the requirements via 
 # `help` command. 
 help 
 ( 
 bpd 
 . 
 remote_function 
 ) 
 # Read a table and inspect the column of interest. 
 df 
 = 
 bpd 
 . 
 read_gbq 
 ( 
 "bigquery-public-data.ml_datasets.penguins" 
 ) 
 df 
 [ 
 "body_mass_g" 
 ] 
 . 
 head 
 ( 
 10 
 ) 
 # Define a custom function, and specify the intent to turn it into a remote 
 # function. It requires a BigQuery connection. If the connection is not 
 # already created, BigQuery DataFrames will attempt to create one assuming 
 # the necessary APIs and IAM permissions are setup in the project. In our 
 # examples we will be letting the default connection `bigframes-default-connection` 
 # be used. We will also set `reuse=False` to make sure we don't 
 # step over someone else creating remote function in the same project from 
 # the exact same source code at the same time. Let's try a `pandas`-like use 
 # case in which we want to apply a user defined scalar function to every 
 # value in a `Series`, more specifically bucketize the `body_mass_g` value 
 # of the penguins, which is a real number, into a category, which is a 
 # string. 
 @bpd 
 . 
 remote_function 
 ( 
 reuse 
 = 
 False 
 , 
 cloud_function_service_account 
 = 
 "default" 
 , 
 ) 
 def 
  
 get_bucket 
 ( 
 num 
 : 
 float 
 ) 
 - 
> str 
 : 
 if 
 not 
 num 
 : 
 return 
 "NA" 
 boundary 
 = 
 4000 
 return 
 "at_or_above_4000" 
 if 
 num 
> = 
 boundary 
 else 
 "below_4000" 
 # Then we can apply the remote function on the `Series` of interest via 
 # `apply` API and store the result in a new column in the DataFrame. 
 df 
 = 
 df 
 . 
 assign 
 ( 
 body_mass_bucket 
 = 
 df 
 [ 
 "body_mass_g" 
 ] 
 . 
 apply 
 ( 
 get_bucket 
 )) 
 # This will add a new column `body_mass_bucket` in the DataFrame. You can 
 # preview the original value and the bucketized value side by side. 
 df 
 [[ 
 "body_mass_g" 
 , 
 "body_mass_bucket" 
 ]] 
 . 
 head 
 ( 
 10 
 ) 
 # The above operation was possible by doing all the computation on the 
 # cloud. For that, there is a google cloud function deployed by serializing 
 # the user code, and a BigQuery remote function created to call the cloud 
 # function via the latter's http endpoint on the data in the DataFrame. 
 # The BigQuery remote function created to support the BigQuery DataFrames 
 # remote function can be located via a property `bigframes_remote_function` 
 # set in the remote function object. 
 print 
 ( 
 f 
 "Created BQ remote function: 
 { 
 get_bucket 
 . 
 bigframes_remote_function 
 } 
 " 
 ) 
 # The cloud function can be located via another property 
 # `bigframes_cloud_function` set in the remote function object. 
 print 
 ( 
 f 
 "Created cloud function: 
 { 
 get_bucket 
 . 
 bigframes_cloud_function 
 } 
 " 
 ) 
 # Warning: The deployed cloud function may be visible to other users with 
 # sufficient privilege in the project, so the user should be careful about 
 # having any sensitive data in the code that will be deployed as a remote 
 # function. 
 # Let's continue trying other potential use cases of remote functions. Let's 
 # say we consider the `species`, `island` and `sex` of the penguins 
 # sensitive information and want to redact that by replacing with their hash 
 # code instead. Let's define another scalar custom function and decorate it 
 # as a remote function. The custom function in this example has external 
 # package dependency, which can be specified via `packages` parameter. 
 @bpd 
 . 
 remote_function 
 ( 
 reuse 
 = 
 False 
 , 
 packages 
 = 
 [ 
 "cryptography" 
 ], 
 cloud_function_service_account 
 = 
 "default" 
 , 
 ) 
 def 
  
 get_hash 
 ( 
 input 
 : 
 str 
 ) 
 - 
> str 
 : 
 from 
  
 cryptography.fernet 
  
 import 
 Fernet 
 # handle missing value 
 if 
 input 
 is 
 None 
 : 
 input 
 = 
 "" 
 key 
 = 
 Fernet 
 . 
 generate_key 
 () 
 f 
 = 
 Fernet 
 ( 
 key 
 ) 
 return 
 f 
 . 
 encrypt 
 ( 
 input 
 . 
 encode 
 ()) 
 . 
 decode 
 () 
 # We can use this remote function in another `pandas`-like API `map` that 
 # can be applied on a DataFrame 
 df_redacted 
 = 
 df 
 [[ 
 "species" 
 , 
 "island" 
 , 
 "sex" 
 ]] 
 . 
 map 
 ( 
 get_hash 
 ) 
 df_redacted 
 . 
 head 
 ( 
 10 
 ) 
 

What's next

To search and filter code samples for other Google Cloud products, see the Google Cloud sample browser .

Create a Mobile Website
View Site in Mobile | Classic
Share by: