Explore query results in notebooks


You can explore BigQuery query results by using Colab Enterprise notebooks in BigQuery.

In this tutorial, you query data from a BigQuery public dataset and explore the query results in a notebook.

Objectives

  • Create and run a query in BigQuery.
  • Explore query results in a notebook.

Costs

This tutorial uses a dataset available through the Google Cloud Public Datasets Program . Google pays for the storage of these datasets and provides public access to the data. You incur charges for the queries that you perform on the data. For more information, see BigQuery pricing .

Before you begin

  1. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Go to project selector

  2. Verify that billing is enabled for your Google Cloud project .

  3. Enable the BigQuery API.

    Enable the API

    For new projects, BigQuery is automatically enabled.

Set the default region for code assets

If this is the first time you are creating a code asset, you should set the default region for code assets. You can't change the region for a code asset after it is created.

All code assets in BigQuery Studio use the same default region. To set the default region for code assets, follow these steps:

  1. Go to the BigQuerypage.

    Go to BigQuery

  2. In the Explorerpane, find the project in which you have enabled code assets.

  3. Click View actionsnext to the project, and then click Change my default code region.

  4. For Region, select the region that you want to use for code assets.

  5. Click Select.

For a list of supported regions, see BigQuery Studio locations .

Required permissions

To create and run notebooks, you need the following Identity and Access Management (IAM) roles:

Open query results in a notebook

You can run a SQL query and then use a notebook to explore the data. This approach is useful if you want to modify the data in BigQuery before working with it, or if you need only a subset of the fields in the table.

  1. In the Google Cloud console, go to the BigQuerypage.

    Go to BigQuery

  2. In the Type to searchfield, enter bigquery-public-data .

    If the project is not shown, enter bigquery in the search field, and then click Search to all projectsto match the search string with the existing projects.

  3. Select bigquery-public-data > ml_datasets > penguins.

  4. For the penguinstable, click View actions, and then click Query.

  5. Add an asterisk ( * ) for field selection to the generated query, so that it reads like the following example:

     SELECT 
      
     * 
      
     FROM 
      
     `bigquery-public-data.ml_datasets.penguins` 
      
     LIMIT 
      
     1000 
     ; 
    
  6. Click Run.

  7. In the Query resultssection, click Explore data, and then click Explore with Python notebook.

Prepare the notebook for use

Prepare the notebook for use by connecting to a runtime and setting application default values.

  1. In the notebook header, click Connectto connect to the default runtime .
  2. In the Setupcode block, click Run cell.

Explore the data

  1. To load the penguinsdata into a BigQuery DataFrame and show the results, click Run cellin the code block in the Result set loaded from BigQuery job as a DataFramesection.
  2. To get descriptive metrics for the data, click Run cellin the code block in the Show descriptive statistics using describe()section.
  3. Optional: Use other Python functions or packages to explore and analyze the data.

The following code sample shows using bigframes.pandas to analyze data, and bigframes.ml to create a linear regression model from penguinsdata in a BigQuery DataFrame:

  import 
  
 bigframes.pandas 
  
 as 
  
 bpd 
 # Load data from BigQuery 
 query_or_table 
 = 
 "bigquery-public-data.ml_datasets.penguins" 
 bq_df 
 = 
 bpd 
 . 
 read_gbq 
 ( 
 query_or_table 
 ) 
 # Inspect one of the columns (or series) of the DataFrame: 
 bq_df 
 [ 
 "body_mass_g" 
 ] 
 # Compute the mean of this series: 
 average_body_mass 
 = 
 bq_df 
 [ 
 "body_mass_g" 
 ] 
 . 
 mean 
 () 
 print 
 ( 
 f 
 "average_body_mass: 
 { 
 average_body_mass 
 } 
 " 
 ) 
 # Find the heaviest species using the groupby operation to calculate the 
 # mean body_mass_g: 
 ( 
 bq_df 
 [ 
 "body_mass_g" 
 ] 
 . 
 groupby 
 ( 
 by 
 = 
 bq_df 
 [ 
 "species" 
 ]) 
 . 
 mean 
 () 
 . 
 sort_values 
 ( 
 ascending 
 = 
 False 
 ) 
 . 
 head 
 ( 
 10 
 ) 
 ) 
 # Create the Linear Regression model 
 from 
  
 bigframes.ml.linear_model 
  
 import 
 LinearRegression 
 # Filter down to the data we want to analyze 
 adelie_data 
 = 
 bq_df 
 [ 
 bq_df 
 . 
 species 
 == 
 "Adelie Penguin (Pygoscelis adeliae)" 
 ] 
 # Drop the columns we don't care about 
 adelie_data 
 = 
 adelie_data 
 . 
 drop 
 ( 
 columns 
 = 
 [ 
 "species" 
 ]) 
 # Drop rows with nulls to get our training data 
 training_data 
 = 
 adelie_data 
 . 
 dropna 
 () 
 # Pick feature columns and label column 
 X 
 = 
 training_data 
 [ 
 [ 
 "island" 
 , 
 "culmen_length_mm" 
 , 
 "culmen_depth_mm" 
 , 
 "flipper_length_mm" 
 , 
 "sex" 
 , 
 ] 
 ] 
 y 
 = 
 training_data 
 [[ 
 "body_mass_g" 
 ]] 
 model 
 = 
 LinearRegression 
 ( 
 fit_intercept 
 = 
 False 
 ) 
 model 
 . 
 fit 
 ( 
 X 
 , 
 y 
 ) 
 model 
 . 
 score 
 ( 
 X 
 , 
 y 
 ) 
 

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

The easiest way to eliminate billing is to delete the Google Cloud project that you created for this tutorial.

  1. In the Google Cloud console, go to the Manage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then click Delete .
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: