Perform classification with a boosted trees model

This tutorial teaches you how to use a boosted trees classifier model to predict the income range of individuals based on their demographic data. The model predicts whether a value falls into one of two categories, in this case whether an individual's annual income falls above or below $50,000.

This tutorial uses the bigquery-public-data.ml_datasets.census_adult_income dataset. This dataset contains the demographic and income information of US residents from 2000 and 2010.

Objectives

This tutorial guides you through completing the following tasks:

Creating a boosted trees model to predict census respondents' income bracket by using the CREATE MODEL statement .
Evaluating the model by using the ML.EVALUATE function .
Getting predictions from the model by using the ML.PREDICT function .

Costs

This tutorial uses billable components of Google Cloud, including the following:

BigQuery
BigQuery ML

For more information about BigQuery costs, see the BigQuery pricing page.

For more information about BigQuery ML costs, see BigQuery ML pricing .

Before you begin

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project : To create a project, you need the Project Creator role ( roles/resourcemanager.projectCreator ), which contains the resourcemanager.projects.create permission. Learn how to grant roles .

Go to project selector

Verify that billing is enabled for your Google Cloud project .

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Roles required to select or create a project

Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project : To create a project, you need the Project Creator role ( roles/resourcemanager.projectCreator ), which contains the resourcemanager.projects.create permission. Learn how to grant roles .

Go to project selector

Verify that billing is enabled for your Google Cloud project .

BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, go to
Enable the BigQuery API.

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role ( roles/serviceusage.serviceUsageAdmin ), which contains the serviceusage.services.enable permission. Learn how to grant roles .

Enable the API

Required Permissions

To create the dataset, you need the bigquery.datasets.create IAM permission.
To create the model, you need the following permissions:
- bigquery.jobs.create
- bigquery.models.create
- bigquery.models.getData
- bigquery.models.updateData
To run inference, you need the following permissions:
- bigquery.models.getData
- bigquery.jobs.create

For more information about IAM roles and permissions in BigQuery, see Introduction to IAM .

Create a dataset

Create a BigQuery dataset to store your ML model.

Console

In the Google Cloud console, go to the BigQuerypage.

Go to the BigQuery page
In the Explorerpane, click your project name.
Click View actions > Create dataset
On the Create datasetpage, do the following:
- For Dataset ID, enter bqml_tutorial .
- For Location type, select Multi-region, and then select US.
- Leave the remaining default settings as they are, and click Create dataset.

bq

To create a new dataset, use the bq mk --dataset command .

Create a dataset named bqml_tutorial with the data location set to US .

bq mk --dataset \
  --location=US \
  --description "BigQuery ML tutorial dataset." \
  bqml_tutorial

Confirm that the dataset was created:
```
bq  
ls
```

API

Call the datasets.insert method with a defined dataset resource .

 { 
  
 "datasetReference" 
 : 
  
 { 
  
 "datasetId" 
 : 
  
 "bqml_tutorial" 
  
 } 
 }

BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in the BigQuery quickstart using BigQuery DataFrames . For more information, see the BigQuery DataFrames reference documentation .

To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up ADC for a local development environment .

  import 
  
 google.cloud.bigquery 
 bqclient 
 = 
 google 
 . 
 cloud 
 . 
  bigquery 
 
 . 
  Client 
 
 () 
 bqclient 
 . 
  create_dataset 
 
 ( 
 "bqml_tutorial" 
 , 
 exists_ok 
 = 
 True 
 )

Perform classification with a boosted trees model

Objectives

Costs

Before you begin

Required Permissions

Create a dataset

Console

bq

API

BigQuery DataFrames

Prepare the sample data

SQL

BigQuery DataFrames

Create the boosted trees model

SQL

BigQuery DataFrames

Evaluate the model

SQL

BigQuery DataFrames

Use the model to predict classifications

SQL

BigQuery DataFrames

Clean up

Delete your dataset

Delete your project

What's next

Perform classification with a boosted trees model Stay organized with collections Save and categorize content based on your preferences.

Objectives

Costs

Before you begin

Required Permissions

Create a dataset

Console

bq

API

BigQuery DataFrames

Prepare the sample data

SQL

BigQuery DataFrames

Create the boosted trees model

SQL

BigQuery DataFrames

Evaluate the model

SQL

BigQuery DataFrames

Use the model to predict classifications

SQL

BigQuery DataFrames

Clean up

Delete your dataset

Delete your project

What's next

Perform classification with a boosted trees model