The ML.TRANSFORM function
This document describes the ML.TRANSFORM
function, which you can use
to preprocess feature data. This function processes input data by
applying the data transformations captured in the TRANSFORM
clause
of an existing model. The statistics that were calculated for data
transformation during model training are applied to the input data of the function.
For more information about which models support this function, see End-to-end user journeys for ML models .
Syntax
ML.TRANSFORM( MODEL ` PROJECT_ID . DATASET . MODEL_NAME `, { TABLE ` PROJECT_ID . DATASET . TABLE_NAME ` | ( QUERY_STATEMENT ) } )
Arguments
ML.TRANSFORM
takes the following arguments:
-
PROJECT_ID: the project that contains the resource. -
DATASET: the BigQuery dataset that contains the resource. -
MODEL_NAME: the name of a model. The model must have been created by using aCREATE MODELstatement that includes aTRANSFORMclause to manually preprocess feature data. You can check to see if a model uses aTRANSFORMclause by using thebq showcommand to look at the model's metadata . If the model was trained using aTRANSFORMclause, the model metadata contains a section about the transform columns. The function returns an error if you specify a model that was trained without aTRANSFORMclause. -
TABLE_NAME: the name of the input table that contains the feature data to preprocess.If you specify a value for the
TABLE_NAMEargument, the input column names in the table must match the input column names in the model'sTRANSFORMclause, and their types should be compatible according to BigQuery implicit coercion rules . You can get the input column names and data types from the model's metadata , in the section about the feature columns. -
QUERY_STATEMENT: A query that generates the feature data to preprocess. For the supported SQL syntax of theQUERY_STATEMENTclause, see GoogleSQL query syntax .If you specify a value for the
QUERY_STATEMENTargument, the input column names from the query must match the input column names in the model'sTRANSFORMclause, and their types should be compatible according to BigQuery implicit coercion rules . You can get the input column names and data types from the model's metadata , in the section about the feature columns.
Output
ML.TRANSFORM
returns the columns specified in the model's TRANSFORM
clause.
Example
The following example returns feature data that has been preprocessed by
using the TRANSFORM
clause included in the model named mydataset.mymodel
in your default project.
Create the model that contains the TRANSFORM
clause:
CREATE OR REPLACE MODEL ` mydataset . mymodel ` TRANSFORM ( species , island , ML . MAX_ABS_SCALER ( culmen_length_mm ) OVER () AS culmen_length_mm , ML . MAX_ABS_SCALER ( flipper_length_mm ) OVER () AS flipper_length_mm , sex , body_mass_g ) OPTIONS ( model_type = 'linear_reg' , input_label_cols = [ 'body_mass_g' ]) AS ( SELECT * FROM ` bigquery - public - data . ml_datasets . penguins ` WHERE body_mass_g IS NOT NULL );
Return feature data preprocessed by the model's TRANSFORM
clause:
SELECT * FROM ML . TRANSFORM ( MODEL ` mydataset . mymodel ` , TABLE ` bigquery - public - data . ml_datasets . penguins ` );
The result is similar to the following:
+-------------------------------------+--------+---------------------+---------------------+--------+-----------------+-------------+ | species | island | culmen_length_mm | flipper_length_mm | sex | culmen_depth_mm | body_mass_g | --------------------------------------+--------+ ------------------- +---------------------+--------+-----------------+-------------+ | Adelie Penguin (Pygoscelis adeliae) | Dream | 0.61409395973154368 | 0.79653679653679654 | Female | 18.4 | 3475.0 | | Adelie Penguin (Pygoscelis adeliae) | Dream | 0.66778523489932884 | 0.79653679653679654 | Male | 19.1 | 4650.0 | +-------------------------------------+--------+---------------------+---------------------+--------+-----------------+-------------+
What's next
- For information about feature preprocessing, see Feature preprocessing overview .

