Stay organized with collectionsSave and categorize content based on your preferences.
The ML.IMPUTER function
This document describes theML.IMPUTERfunction, which lets you replaceNULLvalues in a string or numerical expression. You can replaceNULLvalues
with the most frequently used value for string expressions, or themeanormedianvalue for numerical expressions.
When used in theTRANSFORMclause,
the values calculated during training for mean, median, and most frequently
used value are automatically used in prediction.
Syntax
ML.IMPUTER(expression, strategy) OVER()
Arguments
ML.IMPUTERtakes the following arguments:
expression: thenumericalorSTRINGexpression to impute.
strategy: aSTRINGvalue that specifies how to replaceNULLvalues.
Valid values are as follows:
mean: the mean ofexpression. You can only use this value with
numerical expressions.
median: the median ofexpression. You can only use this value with
numerical expressions.
most_frequent: the most frequent value inexpression.
Output
ML.IMPUTERreturns aFLOAT64(for numerical expressions) orSTRING(for string expressions) value that contains the replacement for theNULLvalue.
Examples
Example 1
The following example imputes numerical expressions:
+------+--------+
| f | output |
+------+--------+
| NULL | c |
| NULL | c |
| NULL | c |
| NULL | c |
| a | a |
| a | a |
| b | b |
| b | b |
| c | c |
| c | c |
| c | c |
+------+--------+
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[[["\u003cp\u003eThe \u003ccode\u003eML.IMPUTER\u003c/code\u003e function replaces \u003ccode\u003eNULL\u003c/code\u003e values in string or numerical expressions.\u003c/p\u003e\n"],["\u003cp\u003eFor numerical expressions, \u003ccode\u003eNULL\u003c/code\u003e values can be replaced with the mean or median value.\u003c/p\u003e\n"],["\u003cp\u003eFor string expressions, \u003ccode\u003eNULL\u003c/code\u003e values are replaced with the most frequently occurring value.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eML.IMPUTER\u003c/code\u003e function can be used with the \u003ccode\u003eTRANSFORM\u003c/code\u003e clause, allowing the values calculated during training to be automatically applied during prediction.\u003c/p\u003e\n"],["\u003cp\u003eThe function outputs either a \u003ccode\u003eFLOAT64\u003c/code\u003e value for numerical expressions or a \u003ccode\u003eSTRING\u003c/code\u003e value for string expressions, representing the imputed value.\u003c/p\u003e\n"]]],[],null,["# The ML.IMPUTER function\n=======================\n\nThis document describes the `ML.IMPUTER` function, which lets you replace\n`NULL` values in a string or numerical expression. You can replace `NULL` values\nwith the most frequently used value for string expressions, or the\n[mean](https://en.wikipedia.org/wiki/Mean) or\n[median](https://en.wikipedia.org/wiki/Median) value for numerical expressions.\n\nWhen used in the\n[`TRANSFORM` clause](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create#transform),\nthe values calculated during training for mean, median, and most frequently\nused value are automatically used in prediction.\n\nSyntax\n------\n\n```sql\nML.IMPUTER(expression, strategy) OVER()\n```\n\nArguments\n---------\n\n`ML.IMPUTER` takes the following arguments:\n\n- `expression`: the [numerical](/bigquery/docs/reference/standard-sql/data-types#numeric_types) or `STRING` expression to impute.\n- `strategy`: a `STRING` value that specifies how to replace `NULL` values. Valid values are as follows:\n - `mean`: the mean of `expression`. You can only use this value with numerical expressions.\n - `median`: the median of `expression`. You can only use this value with numerical expressions.\n - `most_frequent`: the most frequent value in `expression`.\n\nOutput\n------\n\n`ML.IMPUTER` returns a `FLOAT64` (for numerical expressions) or `STRING`\n(for string expressions) value that contains the replacement for the\n`NULL` value.\n\nExamples\n--------\n\n**Example 1**\n\nThe following example imputes numerical expressions: \n\n```sql\nSELECT f, ML.IMPUTER(f, 'mean') OVER () AS output\nFROM\n UNNEST([NULL, -3, -3, -3, 1, 2, 3, 4, 5]) AS f\nORDER BY f;\n```\n\nThe output looks similar to the following: \n\n```\n+------+--------+\n| f | output |\n+------+--------+\n| NULL | 0.75 |\n| -3 | -3.0 |\n| -3 | -3.0 |\n| -3 | -3.0 |\n| 1 | 1.0 |\n| 2 | 2.0 |\n| 3 | 3.0 |\n| 4 | 4.0 |\n| 5 | 5.0 |\n+------+--------+\n```\n\n**Example 2**\n\nThe following example imputes string expressions: \n\n```sql\nSELECT f, ML.IMPUTER(f, 'most_frequent') OVER () AS output\nFROM\n UNNEST([NULL, NULL, NULL, NULL, 'a', 'a', 'b', 'b', 'c', 'c', 'c']) AS f\nORDER BY f;\n```\n\nThe output looks similar to the following: \n\n```\n+------+--------+\n| f | output |\n+------+--------+\n| NULL | c |\n| NULL | c |\n| NULL | c |\n| NULL | c |\n| a | a |\n| a | a |\n| b | b |\n| b | b |\n| c | c |\n| c | c |\n| c | c |\n+------+--------+\n```\n\nWhat's next\n-----------\n\n- For information about feature preprocessing, see [Feature preprocessing overview](/bigquery/docs/preprocess-overview).\n- For information about the supported SQL statements and functions for each model type, see [End-to-end user journey for each model](/bigquery/docs/e2e-journey)."]]