Manage BigQuery DataFrames sessions and I/O

This document explains how to manage sessions and perform input and output (I/O) operations when you use BigQuery DataFrames. You will learn how to create and use sessions, work with in-memory data, and read from and write to files and BigQuery tables.

BigQuery sessions

BigQuery DataFrames uses a local session object internally to manage metadata. Each DataFrame and Series object connects to a session, each session connects to a location , and each query in a session runs in the location where you created the session. Use the following code sample to manually create a session and use it for loading data:

  import 
  
  bigframes 
 
 import 
  
 bigframes.pandas 
  
 as 
  
 bpd 
 # Create session object 
 context 
 = 
  bigframes 
 
 . 
  BigQueryOptions 
 
 ( 
 project 
 = 
 YOUR_PROJECT_ID 
 , 
 location 
 = 
 YOUR_LOCATION 
 , 
 ) 
 session 
 = 
  bigframes 
 
 . 
 Session 
 ( 
 context 
 ) 
 # Load a BigQuery table into a dataframe 
 df1 
 = 
  session 
 
 . 
  read_gbq 
 
 ( 
 "bigquery-public-data.ml_datasets.penguins" 
 ) 
 # Create a dataframe with local data: 
 df2 
 = 
 bpd 
 . 
 DataFrame 
 ({ 
 "my_col" 
 : 
 [ 
 1 
 , 
 2 
 , 
 3 
 ]}, 
 session 
 = 
 session 
 ) 
 

You can't combine data from multiple session instances, even if you initialize them with the same settings. The following code sample shows that trying to combine data from different session instances causes an error:

  import 
  
  bigframes 
 
 import 
  
 bigframes.pandas 
  
 as 
  
 bpd 
 context 
 = 
  bigframes 
 
 . 
  BigQueryOptions 
 
 ( 
 location 
 = 
 YOUR_LOCATION 
 , 
 project 
 = 
 YOUR_PROJECT_ID 
 ) 
 session1 
 = 
  bigframes 
 
 . 
 Session 
 ( 
 context 
 ) 
 session2 
 = 
  bigframes 
 
 . 
 Session 
 ( 
 context 
 ) 
 series1 
 = 
 bpd 
 . 
 Series 
 ([ 
 1 
 , 
 2 
 , 
 3 
 , 
 4 
 , 
 5 
 ], 
 session 
 = 
 session1 
 ) 
 series2 
 = 
 bpd 
 . 
 Series 
 ([ 
 1 
 , 
 2 
 , 
 3 
 , 
 4 
 , 
 5 
 ], 
 session 
 = 
 session2 
 ) 
 try 
 : 
 series1 
 + 
 series2 
 except 
 ValueError 
 as 
 e 
 : 
 print 
 ( 
 e 
 ) 
 # Error message: Cannot use combine sources from multiple sessions 
 

Global session

BigQuery DataFrames provides a default global session that you can access with the bigframes.pandas.get_global_session() method. In Colab, you must provide a project ID for the bigframes.pandas.options.bigquery.project attribute before you use it. You can also set a location with the bigframes.pandas.options.bigquery.location attribute, which defaults to the US multi-region.

The following code sample shows how to set options for the global session:

  import 
  
 bigframes.pandas 
  
 as 
  
 bpd 
 # Set project ID for the global session 
 bpd 
 . 
 options 
 . 
 bigquery 
 . 
 project 
 = 
 YOUR_PROJECT_ID 
 # Update the global default session location 
 bpd 
 . 
 options 
 . 
 bigquery 
 . 
 location 
 = 
 YOUR_LOCATION 
 

To reset the global session's location or project, close the current session by running the bigframes.pandas.close_session() method.

Many BigQuery DataFrames built-in functions use the global session by default. The following code sample shows how built-in functions use the global session:

  # The following two statements are essentially the same 
 df 
 = 
 bpd 
 . 
 read_gbq 
 ( 
 "bigquery-public-data.ml_datasets.penguins" 
 ) 
 df 
 = 
 bpd 
 . 
 get_global_session 
 () 
 . 
 read_gbq 
 ( 
 "bigquery-public-data.ml_datasets.penguins" 
 ) 
 

In-memory data

You can create Dataframes and Series objects with built-in Python or NumPy data structures, similar to how you create objects with pandas. Use the following code sample to create an object:

  import 
  
 numpy 
  
 as 
  
 np 
 import 
  
 bigframes.pandas 
  
 as 
  
 bpd 
 s 
 = 
 bpd 
 . 
 Series 
 ([ 
 1 
 , 
 2 
 , 
 3 
 ]) 
 # Create a dataframe with Python dict 
 df 
 = 
 bpd 
 . 
 DataFrame 
 ( 
 { 
 "col_1" 
 : 
 [ 
 1 
 , 
 2 
 , 
 3 
 ], 
 "col_2" 
 : 
 [ 
 4 
 , 
 5 
 , 
 6 
 ], 
 } 
 ) 
 # Create a series with Numpy 
 s 
 = 
 bpd 
 . 
 Series 
 ( 
 np 
 . 
 arange 
 ( 
 10 
 )) 
 

To convert pandas objects to DataFrames objects using the read_pandas() method or constructors, use the following code sample:

  import 
  
 numpy 
  
 as 
  
 np 
 import 
  
 pandas 
  
 as 
  
 pd 
 import 
  
 bigframes.pandas 
  
 as 
  
 bpd 
 pd_df 
 = 
 pd 
 . 
 DataFrame 
 ( 
 np 
 . 
 random 
 . 
 randn 
 ( 
 4 
 , 
 2 
 )) 
 # Convert Pandas dataframe to BigQuery DataFrame with read_pandas() 
 df_1 
 = 
 bpd 
 . 
 read_pandas 
 ( 
 pd_df 
 ) 
 # Convert Pandas dataframe to BigQuery DataFrame with the dataframe constructor 
 df_2 
 = 
 bpd 
 . 
 DataFrame 
 ( 
 pd_df 
 ) 
 

To use the to_pandas() method to load BigQuery DataFrames data into your memory, use the following code sample:

  import 
  
 bigframes.pandas 
  
 as 
  
 bpd 
 bf_df 
 = 
 bpd 
 . 
 DataFrame 
 ({ 
 "my_col" 
 : 
 [ 
 1 
 , 
 2 
 , 
 3 
 ]}) 
 # Returns a Pandas Dataframe 
 bf_df 
 . 
 to_pandas 
 () 
 bf_s 
 = 
 bpd 
 . 
 Series 
 ([ 
 1 
 , 
 2 
 , 
 3 
 ]) 
 # Returns a Pandas Series 
 bf_s 
 . 
 to_pandas 
 () 
 

Cost estimation with the dry_run parameter

Loading a large amount of data can take a lot of time and resources. To see how much data is being processed, use the dry_run=True parameter in the to_pandas() call. Use the following code sample to perform a dry run:

  import 
  
 bigframes.pandas 
  
 as 
  
 bpd 
 df 
 = 
 bpd 
 . 
 read_gbq 
 ( 
 "bigquery-public-data.ml_datasets.penguins" 
 ) 
 # Returns a Pandas series with dry run stats 
 df 
 . 
 to_pandas 
 ( 
 dry_run 
 = 
 True 
 ) 
 

Read and write files

You can read data from compatible files into a BigQuery DataFrames. These files can be on your local machine or in Cloud Storage. Use the following code sample to read data from a CSV file:

  import 
  
 bigframes.pandas 
  
 as 
  
 bpd 
 # Read a CSV file from GCS 
 df 
 = 
 bpd 
 . 
 read_csv 
 ( 
 "gs://cloud-samples-data/bigquery/us-states/us-states.csv" 
 ) 
 

To save your BigQuery DataFrames to local files or Cloud Storage files using the to_csv method, use the following code sample:

  import 
  
 bigframes.pandas 
  
 as 
  
 bpd 
 df 
 = 
 bpd 
 . 
 DataFrame 
 ({ 
 "my_col" 
 : 
 [ 
 1 
 , 
 2 
 , 
 3 
 ]}) 
 # Write a dataframe to a CSV file in GCS 
 df 
 . 
 to_csv 
 ( 
 f 
 "gs:// 
 { 
 YOUR_BUCKET 
 } 
 /myfile*.csv" 
 ) 
 

Read and write BigQuery tables

To create BigQuery DataFrames using BigQuery table references and the bigframes.pandas.read_gbq function, use the following code sample:

  import 
  
 bigframes.pandas 
  
 as 
  
 bpd 
 df 
 = 
 bpd 
 . 
 read_gbq 
 ( 
 "bigquery-public-data.ml_datasets.penguins" 
 ) 
 

To use a SQL string with the read_gbq() function to read data into BigQuery DataFrames, use the following code sample:

  import 
  
 bigframes.pandas 
  
 as 
  
 bpd 
 sql 
 = 
 """ 
 SELECT species, island, body_mass_g 
 FROM bigquery-public-data.ml_datasets.penguins 
 WHERE sex = 'MALE' 
 """ 
 df 
 = 
 bpd 
 . 
 read_gbq 
 ( 
 sql 
 ) 
 

To save your DataFrame object to a BigQuery table, use the to_gbq() method of your DataFrame object. The following code sample shows how to do that:

  import 
  
 bigframes.pandas 
  
 as 
  
 bpd 
 df 
 = 
 bpd 
 . 
 DataFrame 
 ({ 
 "my_col" 
 : 
 [ 
 1 
 , 
 2 
 , 
 3 
 ]}) 
 df 
 . 
 to_gbq 
 ( 
 f 
 " 
 { 
 YOUR_PROJECT_ID 
 } 
 . 
 { 
 YOUR_DATASET_ID 
 } 
 . 
 { 
 YOUR_TABLE_NAME 
 } 
 " 
 ) 
 

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: