Use BigQuery ML to predict penguin weightStay organized with collectionsSave and categorize content based on your preferences.
In this tutorial, you use alinear regression modelin BigQuery ML to predict the weight of a penguin based on the
penguin's demographic information. A linear regression is a type of regression
model that generates a continuous value from a linear combination of input
features.
In the Google Cloud console, on the project selector page,
select or create a Google Cloud project.
Roles required to select or create a project
Select a project: Selecting a project doesn't require a specific
IAM role—you can select any project that you've been
granted a role on.
Create a project: To create a project, you need the Project Creator role
(roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant
roles.
To enable APIs, you need the Service Usage Admin IAM
role (roles/serviceusage.serviceUsageAdmin), which
contains theserviceusage.services.enablepermission.Learn how to grant
roles.
To see the results of the model training, you can use theML.TRAINING_INFOfunction,
or you can view the statistics in the Google Cloud console. In this
tutorial, you use the Google Cloud console.
A machine learning algorithm builds a model by examining many examples and
attempting to find a model that minimizes loss. This process is called empirical
risk minimization.
Loss is the penalty for a bad prediction. It is a number indicating
how bad the model's prediction was on a single example. If the model's
prediction is perfect, the loss is zero; otherwise, the loss is greater. The
goal of training a model is to find a set of weights and biases that have low
loss, on average, across all examples.
See the model training statistics that were generated when you ran theCREATE MODELquery:
In the left pane, clickexploreExplorer:
In theExplorerpane, expand your project and clickDatasets.
Click thebqml_tutorialdataset.
Click theModelstab.
To open the model information pane, clickpenguins_model.
Click theTrainingtab, and then clickTable. The results should look
similar to the following:
TheTraining Data Losscolumn represents the loss metric calculated
after the model is trained on the training dataset. Since you performed a
linear regression, this column shows themean squared errorvalue. Anormal_equationoptimization strategy is automatically used for this training, so only one
iteration is required to converge to the final model. For more information
on setting the model optimization strategy, seeoptimize_strategy.
Evaluate the model
After creating the model, evaluate the model's performance by using theML.EVALUATEfunctionor thescoreBigQuery DataFrames functionto evaluate the predicted values generated by the model against the actual data.
Because you performed a linear regression, the results include the following
columns:
mean_absolute_error
mean_squared_error
mean_squared_log_error
median_absolute_error
r2_score
explained_variance
An important metric in the evaluation results is theR2score.
The R2score is a statistical measure that determines if the linear
regression predictions approximate the actual data. A value of0indicates
that the model explains none of the variability of the response data around the
mean. A value of1indicates that the model explains all the variability of
the response data around the mean.
You can also look at the model's information pane in the Google Cloud console
to view the evaluation metrics:
Use the model to predict outcomes
Now that you have evaluated your model, the next step is to use it to predict
an outcome. You can run theML.PREDICTfunctionor thepredictBigQuery DataFrames functionon the model to predict the body mass in grams of all penguins that reside on
the Biscoe Islands.
For linear regression models, Shapley values are used to generate feature
attribution values for each feature in the model. The output includes
the top three feature attributions per row of thepenguinstable becausetop_k_featureswas set to3. These attributions are sorted by
the absolute value of the attribution in descending order. In all examples, the
featuresexcontributed the most to the overall prediction.
To avoid incurring charges to your Google Cloud account for the resources used in this
tutorial, either delete the project that contains the resources, or keep the project and
delete the individual resources.
You can delete the project you created.
Or you can keep the project and delete the dataset.
Delete your dataset
Deleting your project removes all datasets and all tables in the project. If you
prefer to reuse the project, you can delete the dataset you created in this
tutorial:
If necessary, open the BigQuery page in the
Google Cloud console.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2026-05-20 UTC."],[],[]]