Stay organized with collectionsSave and categorize content based on your preferences.
BigQuery ML model evaluation overview
This document describes how BigQuery ML supports machine learning (ML)
model evaluation.
Overview of model evaluation
You can use ML model evaluation metrics for the following
purposes:
To assess the quality of the fit between the model and the data.
To compare different models.
To predict how accurately you can expect each model to perform on a specific
dataset, in the context of model selection.
Supervised and unsupervised learning model evaluations work differently:
For supervised learning models, model evaluation is well-defined. An
evaluation set, which is data that hasn't been analyzed by the model, is
typically excluded from the training set and then used to evaluate model
performance. We recommend that you don't use the training set for evaluation
because this causes the model to perform poorly when generalizing the
prediction results for new data. This outcome is known asoverfitting.
For unsupervised learning models, model evaluation is less defined and
typically varies from model to model. Because unsupervised learning models don't
reserve an evaluation set, the evaluation metrics are calculated using the
whole input dataset.
Model evaluation offerings
BigQuery ML provides the following functions to calculate
evaluation metrics for ML models:
It also reports other information about seasonality, holiday effects,
and spikes-and-dips outliers.
This function doesn't require new data as input.
Automatic evaluation inCREATE MODELstatements
BigQuery ML supports automatic evaluation during model creation.
Depending on the model type, the data split training options, and whether you're
using hyperparameter tuning, the evaluation metrics are calculated upon
the reserved evaluation dataset, the reserved test dataset, or the entire input
dataset.
For k-means, PCA, autoencoder, and ARIMA_PLUS models, BigQuery ML
uses all of the input data as training data, and evaluation metrics are
calculated against the entire input dataset.
For linear and logistic regression, boosted tree, random forest, DNN,
Wide-and-deep, and matrix factorization models, evaluation metrics are
calculated against the dataset that's specified by the followingCREATE MODELoptions:
When you train these types of models using hyperparameter tuning, theDATA_SPLIT_TEST_FRACTIONoption also helps
define the dataset that the evaluation metrics are calculated against. For
more information, seeData split.
To get evaluation metrics calculated during model creation, use evaluation
functions such asML.EVALUATEon the model with no input data specified.
For an example, seeML.EVALUATEwith no input data specified.
Evaluation with a new dataset
After model creation, you can specify new datasets for evaluation. To provide
a new dataset, use evaluation functions likeML.EVALUATEon the model with
input data specified. For an example, seeML.EVALUATEwith a custom threshold and input data.
What's next
For more information about supported SQL statements and functions for models
that support evaluation, see the following documents:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eBigQuery ML supports model evaluation to assess model-data fit, compare models, and predict model performance on new datasets.\u003c/p\u003e\n"],["\u003cp\u003eSupervised learning models utilize a separate evaluation set to prevent overfitting, while unsupervised learning models use the entire input dataset for evaluation.\u003c/p\u003e\n"],["\u003cp\u003eBigQuery ML offers a variety of \u003ccode\u003eML.EVALUATE\u003c/code\u003e functions to calculate evaluation metrics for supervised and unsupervised models such as regressions, classifications, and clustering, each providing model-specific results.\u003c/p\u003e\n"],["\u003cp\u003eModel evaluation can be done automatically during model creation using reserved evaluation or test datasets based on the model type and chosen data split options, or with new data after creation.\u003c/p\u003e\n"],["\u003cp\u003eSpecific functions such as \u003ccode\u003eML.CONFUSION_MATRIX\u003c/code\u003e and \u003ccode\u003eML.ROC_CURVE\u003c/code\u003e are available for a more granular evaluation, including confusion matrices and metrics for different threshold values, respectively.\u003c/p\u003e\n"]]],[],null,["# BigQuery ML model evaluation overview\n=====================================\n\nThis document describes how BigQuery ML supports machine learning (ML)\nmodel evaluation.\n\nOverview of model evaluation\n----------------------------\n\nYou can use ML model evaluation metrics for the following\npurposes:\n\n- To assess the quality of the fit between the model and the data.\n- To compare different models.\n- To predict how accurately you can expect each model to perform on a specific dataset, in the context of model selection.\n\nSupervised and unsupervised learning model evaluations work differently:\n\n- For supervised learning models, model evaluation is well-defined. An evaluation set, which is data that hasn't been analyzed by the model, is typically excluded from the training set and then used to evaluate model performance. We recommend that you don't use the training set for evaluation because this causes the model to perform poorly when generalizing the prediction results for new data. This outcome is known as *overfitting*.\n- For unsupervised learning models, model evaluation is less defined and typically varies from model to model. Because unsupervised learning models don't reserve an evaluation set, the evaluation metrics are calculated using the whole input dataset.\n\nFor information about the supported SQL statements and functions for each\nmodel type, see\n[End-to-end user journey for each model](/bigquery/docs/e2e-journey).\n\nModel evaluation offerings\n--------------------------\n\nBigQuery ML provides the following functions to calculate\nevaluation metrics for ML models:\n\nAutomatic evaluation in `CREATE MODEL` statements\n-------------------------------------------------\n\nBigQuery ML supports automatic evaluation during model creation.\nDepending on the model type, the data split training options, and whether you're\nusing hyperparameter tuning, the evaluation metrics are calculated upon\nthe reserved evaluation dataset, the reserved test dataset, or the entire input\ndataset.\n\n- For k-means, PCA, autoencoder, and ARIMA_PLUS models, BigQuery ML\n uses all of the input data as training data, and evaluation metrics are\n calculated against the entire input dataset.\n\n- For linear and logistic regression, boosted tree, random forest, DNN,\n Wide-and-deep, and matrix factorization models, evaluation metrics are\n calculated against the dataset that's specified by the following\n `CREATE MODEL` options:\n\n - [`DATA_SPLIT_METHOD`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-glm#data_split_method)\n - [`DATA_SPLIT_EVAL_FRACTION`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-glm#data_split_eval_fraction)\n - [`DATA_SPLIT_COL`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-glm#data_split_col)\n\n When you train these types of models using hyperparameter tuning, the\n [`DATA_SPLIT_TEST_FRACTION`](/bigquery/docs/reference/standard-sql/bigqueryml-hyperparameter-tuning#data_split) option also helps\n define the dataset that the evaluation metrics are calculated against. For\n more information, see\n [Data split](/bigquery/docs/reference/standard-sql/bigqueryml-hyperparameter-tuning#data_split).\n- For AutoML Tables models, see\n [how data splits are used](/automl-tables/docs/prepare#how_data_splits_are_used)\n for training and evaluation.\n\nTo get evaluation metrics calculated during model creation, use evaluation\nfunctions such as `ML.EVALUATE` on the model with no input data specified.\nFor an example, see\n[`ML.EVALUATE` with no input data specified](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate#mlevaluate_with_no_input_data_specified).\n\nEvaluation with a new dataset\n-----------------------------\n\nAfter model creation, you can specify new datasets for evaluation. To provide\na new dataset, use evaluation functions like `ML.EVALUATE` on the model with\ninput data specified. For an example, see\n[`ML.EVALUATE` with a custom threshold and input data](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-evaluate#mlevaluate_with_a_custom_threshold_and_input_data)."]]