Use BigQuery ML to predict penguin weight

In this tutorial, you use a linear regression model in BigQuery ML to predict the weight of a penguin based on the penguin's demographic information. A linear regression is a type of regression model that generates a continuous value from a linear combination of input features.

This tutorial uses the bigquery-public-data.ml_datasets.penguins dataset.

Objectives

In this tutorial, you will perform the following tasks:

Create a linear regression model.
Evaluate the model.
Make predictions by using the model.

Costs

This tutorial uses billable components of Google Cloud, including the following:

BigQuery
BigQuery ML

For more information on BigQuery costs, see the BigQuery pricing page.

For more information on BigQuery ML costs, see BigQuery ML pricing .

Before you begin

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project : To create a project, you need the Project Creator role ( roles/resourcemanager.projectCreator ), which contains the resourcemanager.projects.create permission. Learn how to grant roles .
Note : If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

Go to project selector
Verify that billing is enabled for your Google Cloud project .
Enable the BigQuery API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role ( roles/serviceusage.serviceUsageAdmin ), which contains the serviceusage.services.enable permission. Learn how to grant roles .

Enable the API

Required permissions

To create the model using BigQuery ML, you need the following IAM permissions:

bigquery.jobs.create
bigquery.models.create
bigquery.models.getData
bigquery.models.updateData
bigquery.models.updateMetadata

To run inference, you need the following permissions:

bigquery.models.getData on the model
bigquery.jobs.create

Create a dataset

Create a BigQuery dataset to store your ML model.

Console

In the Google Cloud console, go to the BigQuerypage.

Go to the BigQuery page
In the Explorerpane, click your project name.
Click View actions > Create dataset
On the Create datasetpage, do the following:
- For Dataset ID, enter bqml_tutorial .
- For Location type, select Multi-region, and then select US.
- Leave the remaining default settings as they are, and click Create dataset.

bq

To create a new dataset, use the bq mk --dataset command .

Create a dataset named bqml_tutorial with the data location set to US .

bq mk --dataset \
  --location=US \
  --description "BigQuery ML tutorial dataset." \
  bqml_tutorial

Confirm that the dataset was created:
```
bq  
ls
```

API

Call the datasets.insert method with a defined dataset resource .

 { 
  
 "datasetReference" 
 : 
  
 { 
  
 "datasetId" 
 : 
  
 "bqml_tutorial" 
  
 } 
 }

BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames . For more information, see the BigQuery DataFrames reference documentation .

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment .

  import 
  
 google.cloud.bigquery 
 bqclient 
 = 
 google 
 . 
 cloud 
 . 
  bigquery 
 
 . 
  Client 
 
 () 
 bqclient 
 . 
  create_dataset 
 
 ( 
 "bqml_tutorial" 
 , 
 exists_ok 
 = 
 True 
 )

Use BigQuery ML to predict penguin weight

Objectives

Costs

Before you begin

Required permissions

Create a dataset

Console

bq

API

BigQuery DataFrames

Create the model

SQL

BigQuery DataFrames

Get training statistics

Evaluate the model

SQL

BigQuery DataFrames

Use the model to predict outcomes

SQL

BigQuery DataFrames

Explain the prediction results

SQL

BigQuery DataFrames

Globally explain the model

SQL

BigQuery DataFrames

Clean up

Delete your dataset

Delete your project

What's next

Use BigQuery ML to predict penguin weight Stay organized with collections Save and categorize content based on your preferences.

Objectives

Costs

Before you begin

Required permissions

Create a dataset

Console

bq

API

BigQuery DataFrames

Create the model

SQL

BigQuery DataFrames

Get training statistics

Evaluate the model

SQL

BigQuery DataFrames

Use the model to predict outcomes

SQL

BigQuery DataFrames

Explain the prediction results

SQL

BigQuery DataFrames

Globally explain the model

SQL

BigQuery DataFrames

Clean up

Delete your dataset

Delete your project

What's next

Use BigQuery ML to predict penguin weight