BigQuery ML model evaluation overview
This document describes how BigQuery ML supports machine learning (ML) model evaluation.
Overview of model evaluation
You can use ML model evaluation metrics for the following purposes:
- To assess the quality of the fit between the model and the data.
- To compare different models.
- To predict how accurately you can expect each model to perform on a specific dataset, in the context of model selection.
Supervised and unsupervised learning model evaluations work differently:
- For supervised learning models, model evaluation is well-defined. An evaluation set, which is data that hasn't been analyzed by the model, is typically excluded from the training set and then used to evaluate model performance. We recommend that you don't use the training set for evaluation because this causes the model to perform poorly when generalizing the prediction results for new data. This outcome is known as overfitting .
- For unsupervised learning models, model evaluation is less defined and typically varies from model to model. Because unsupervised learning models don't reserve an evaluation set, the evaluation metrics are calculated using the whole input dataset.
For information about the supported SQL statements and functions for each model type, see End-to-end user journey for each model .
Model evaluation offerings
BigQuery ML provides the following functions to calculate evaluation metrics for ML models:
Boosted trees regressor
Random forest regressor
DNN regressor
Wide-and-deep regressor
AutoML Tables regressor
- mean absolute error
- mean squared error
- mean squared log error
- median absolute error
- r2 score
- explained variance
Boosted trees classifier
Random forest classifier
DNN classifier
Wide-and-deep classifier
AutoML Tables classifier
- precision
- recall
- accuracy
- F1 score
- log loss
- roc auc
- recall
- false positive rate
- true positives
- false positives
- true negatives
- false negatives
Only applies to binary-class classification models.
- mean absolute error
- mean squared error
- mean squared log error
- median absolute error
- r2 score
- explained variance
- mean average precision
- mean squared error
- normalized discounted cumulative gain
- average rank
- mean absolute error
- mean squared error
- mean squared log error
- mean absolute error
- mean squared error
- mean absolute percentage error
- symmetric mean absolute percentage error
This function requires new data as input.
- log_likelihood
- AIC
- variance
It also reports other information about seasonality, holiday effects, and spikes-and-dips outliers.
This function doesn't require new data as input.
Automatic evaluation in CREATE MODEL
statements
BigQuery ML supports automatic evaluation during model creation. Depending on the model type, the data split training options, and whether you're using hyperparameter tuning, the evaluation metrics are calculated upon the reserved evaluation dataset, the reserved test dataset, or the entire input dataset.
-
For k-means, PCA, autoencoder, and ARIMA_PLUS models, BigQuery ML uses all of the input data as training data, and evaluation metrics are calculated against the entire input dataset.
-
For linear and logistic regression, boosted tree, random forest, DNN, Wide-and-deep, and matrix factorization models, evaluation metrics are calculated against the dataset that's specified by the following
CREATE MODEL
options:When you train these types of models using hyperparameter tuning, the
DATA_SPLIT_TEST_FRACTION
option also helps define the dataset that the evaluation metrics are calculated against. For more information, see Data split . -
For AutoML Tables models, see how data splits are used for training and evaluation.
To get evaluation metrics calculated during model creation, use evaluation
functions such as ML.EVALUATE
on the model with no input data specified.
For an example, see ML.EVALUATE
with no input data specified
.
Evaluation with a new dataset
After model creation, you can specify new datasets for evaluation. To provide
a new dataset, use evaluation functions like ML.EVALUATE
on the model with
input data specified. For an example, see ML.EVALUATE
with a custom threshold and input data
.