The ML.FEATURES_AT_TIME function
This document describes the ML.FEATURES_AT_TIME
function, which lets you use
a point-in-time cutoff for all entities when retrieving features, because
features can have time dependencies if they include time-sensitive data. To
avoid data leakage
,
use point-in-time features when training models and running inference.
Use this function to use the same point-in-time cutoff for all entities when
retrieving features. Use the ML.ENTITY_FEATURES_AT_TIME
function
to retrieve features from multiple points in time for multiple entities.
Syntax
ML.FEATURES_AT_TIME( { TABLE ` PROJECT_ID . DATASET . TABLE_NAME ` | ( QUERY_STATEMENT ) } [, TIME => TIMESTAMP][, NUM_ROWS => INT64][, IGNORE_FEATURE_NULLS => BOOL])
Arguments
ML.FEATURES_AT_TIME
takes the following arguments:
-
PROJECT_ID: the project that contains the table. -
DATASET: the BigQuery dataset that contains the table. -
TABLE_NAME: is the name of the BigQuery table that contains the feature data. The feature table must contain the following columns:-
entity_id: aSTRINGcolumn that contains the ID of the entity related to the features. - One or more feature columns.
-
feature_timestamp: aTIMESTAMPcolumn that identifies when the feature data was last updated.
The column names are case-insensitive. For example, you can use a column named
Entity_IDinstead ofentity_id.The feature table must be in wide format, with one column for each feature.
-
-
QUERY_STATEMENT: aSTRINGvalue that specifies a GoogleSQL query that returns the feature data. This query must return the same columns as theTABLE_NAMEargument. See GoogleSQL query syntax for the supported SQL syntax of theQUERY_STATEMENTclause. -
TIME: aTIMESTAMPvalue that specifies the point in time to use as a cutoff for feature data. Only rows where the value in thefeature_timestampcolumn is equal to or earlier than theTIMEvalue are returned. Defaults to the value of theCURRENT_TIMESTAMPfunction . -
NUM_ROWS: anINT64value that specifies the number of rows to return for each entity ID. Defaults to1. -
IGNORE_FEATURE_NULLS: aBOOLvalue that indicates whether to replace aNULLvalue in a feature column with the feature column value from the row for the same entity that immediately precedes it in time. For example, for the following feature table:+-----------+------+------+--------------------------+ | entity_id | f1 | f2 | feature_timestamp | +-----------+------+------+--------------------------+ | '2' | 5.0 | 8.0 | '2022-06-10 09:00:00+00' | +-----------+------+------+--------------------------+ | '2' | 2.0 | 4.0 | '2022-06-10 12:00:00+00' | +-----------+------+------+--------------------------+ | '2' | 7.0 | NULL | '2022-06-11 10:00:00+00' | +-----------+------+------+--------------------------+
Running this query:
SELECT * FROM ML . FEATURES_AT_TIME ( TABLE mydataset . feature_table , time => '2022-06-11 10:00:00+00' , num_rows => 1 , ignore_feature_nulls => TRUE );
Results in the following output, where the
f2value from the row for entity ID 2 that is timestamped'2022-06-10 12:00:00+00'is substituted for theNULLvalue in the row timestamped'2022-06-11 10:00:00+00':+-----------+------+------+--------------------------+ | entity_id | f1 | f2 | feature_timestamp | +-----------+------+------+--------------------------+ | '2' | 7.0 | 4.0 | '2022-06-11 10:00:00+00' | +-----------+------+------+--------------------------+
If there is no available replacement value, for example, where there is no earlier row for that entity ID, a
NULLvalue is returned.Defaults to
FALSE.
Output
The ML.FEATURES_AT_TIME
function returns the input table rows that meet the
point-in-time cutoff criteria, with the feature_timestamp
column showing the
timestamp that was input in the time
argument.
Examples
Example 1
This example shows a how to retrain a model using only features that were
created or updated before 2023-01-01 12:00:00+00
:
CREATE OR REPLACE ` mydataset . mymodel ` OPTIONS ( WARM_START = TRUE ) AS SELECT * EXCEPT ( feature_timestamp , entity_id ) FROM ML . FEATURES_AT_TIME ( TABLE ` mydataset . feature_table ` , time => '2023-01-01 12:00:00+00' , num_rows => 1 , ignore_feature_nulls => TRUE );
Example 2
This example shows how to get predictions from a model based on features
that were created or updated before 2023-01-01 12:00:00+00
:
SELECT * FROM ML . PREDICT ( MODEL ` mydataset . mymodel ` , ( SELECT * EXCEPT ( feature_timestamp , entity_id ) FROM ML . FEATURES_AT_TIME ( TABLE ` mydataset . feature_table ` , time => '2023-01-01 12:00:00+00' , num_rows => 1 , ignore_feature_nulls => TRUE ) ) );
Example 3
This is a contrived example that you can use to see the output of the function:
WITH feature_table AS ( SELECT * FROM UNNEST ( ARRAY<STRUCT<entity_id STRING , f_1 FLOAT64 , f_2 FLOAT64 , feature_timestamp TIMESTAMP >> [ ( 'id1' , 1 . 0 , 1 . 0 , TIMESTAMP '2022-06-10 12:00:00+00' ), ( 'id2' , 12 . 0 , 24 . 0 , TIMESTAMP '2022-06-11 12:00:00+00' ), ( 'id1' , 11 . 0 , NULL , TIMESTAMP '2022-06-11 12:00:00+00' ), ( 'id1' , 6 . 0 , 12 . 0 , TIMESTAMP '2022-06-11 10:00:00+00' ), ( 'id2' , 2 . 0 , 4 . 0 , TIMESTAMP '2022-06-10 12:00:00+00' ), ( 'id2' , 7 . 0 , NULL , TIMESTAMP '2022-06-11 10:00:00+00' )]) ) SELECT * FROM ML . FEATURES_AT_TIME ( TABLE feature_table , time => '2022-06-12 10:00:00+00' , num_rows => 1 , ignore_feature_nulls => TRUE );

