The ML.FEATURES_AT_TIME function

This document describes the ML.FEATURES_AT_TIME function, which lets you use a point-in-time cutoff for all entities when retrieving features, because features can have time dependencies if they include time-sensitive data. To avoid data leakage , use point-in-time features when training models and running inference.

Use this function to use the same point-in-time cutoff for all entities when retrieving features. Use the ML.ENTITY_FEATURES_AT_TIME function to retrieve features from multiple points in time for multiple entities.

Syntax

ML.FEATURES_AT_TIME(
   { TABLE ` PROJECT_ID 
. DATASET 
. TABLE_NAME 
` | ( QUERY_STATEMENT 
) }
   [, TIME 
=> TIMESTAMP][, NUM_ROWS 
=> INT64][, IGNORE_FEATURE_NULLS 
=> BOOL])

Arguments

ML.FEATURES_AT_TIME takes the following arguments:

  • PROJECT_ID : the project that contains the table.
  • DATASET : the BigQuery dataset that contains the table.
  • TABLE_NAME : is the name of the BigQuery table that contains the feature data. The feature table must contain the following columns:

    • entity_id : a STRING column that contains the ID of the entity related to the features.
    • One or more feature columns.
    • feature_timestamp : a TIMESTAMP column that identifies when the feature data was last updated.

    The column names are case-insensitive. For example, you can use a column named Entity_ID instead of entity_id .

    The feature table must be in wide format, with one column for each feature.

  • QUERY_STATEMENT : a STRING value that specifies a GoogleSQL query that returns the feature data. This query must return the same columns as the TABLE_NAME argument. See GoogleSQL query syntax for the supported SQL syntax of the QUERY_STATEMENT clause.

  • TIME : a TIMESTAMP value that specifies the point in time to use as a cutoff for feature data. Only rows where the value in the feature_timestamp column is equal to or earlier than the TIME value are returned. Defaults to the value of the CURRENT_TIMESTAMP function .

  • NUM_ROWS : an INT64 value that specifies the number of rows to return for each entity ID. Defaults to 1 .

  • IGNORE_FEATURE_NULLS : a BOOL value that indicates whether to replace a NULL value in a feature column with the feature column value from the row for the same entity that immediately precedes it in time. For example, for the following feature table:

    +-----------+------+------+--------------------------+
    | entity_id | f1   | f2   | feature_timestamp        |
    +-----------+------+------+--------------------------+
    | '2'       | 5.0  | 8.0  | '2022-06-10 09:00:00+00' |
    +-----------+------+------+--------------------------+
    | '2'       | 2.0  | 4.0  | '2022-06-10 12:00:00+00' |
    +-----------+------+------+--------------------------+
    | '2'       | 7.0  | NULL | '2022-06-11 10:00:00+00' |
    +-----------+------+------+--------------------------+

    Running this query:

     SELECT 
      
     * 
     FROM 
      
     ML 
     . 
     FEATURES_AT_TIME 
     ( 
      
     TABLE 
      
     mydataset 
     . 
     feature_table 
     , 
      
     time 
      
     => 
      
     '2022-06-11 10:00:00+00' 
     , 
      
     num_rows 
      
     => 
      
     1 
     , 
      
     ignore_feature_nulls 
      
     => 
      
     TRUE 
     ); 
    

    Results in the following output, where the f2 value from the row for entity ID 2 that is timestamped '2022-06-10 12:00:00+00' is substituted for the NULL value in the row timestamped '2022-06-11 10:00:00+00' :

    +-----------+------+------+--------------------------+
    | entity_id | f1   | f2   | feature_timestamp        |
    +-----------+------+------+--------------------------+
    | '2'       | 7.0  | 4.0  | '2022-06-11 10:00:00+00' |
    +-----------+------+------+--------------------------+

    If there is no available replacement value, for example, where there is no earlier row for that entity ID, a NULL value is returned.

    Defaults to FALSE .

Output

The ML.FEATURES_AT_TIME function returns the input table rows that meet the point-in-time cutoff criteria, with the feature_timestamp column showing the timestamp that was input in the time argument.

Examples

Example 1

This example shows a how to retrain a model using only features that were created or updated before 2023-01-01 12:00:00+00 :

 CREATE 
  
 OR 
  
 REPLACE 
  
 ` 
 mydataset 
 . 
 mymodel 
 ` 
  
 OPTIONS 
  
 ( 
 WARM_START 
  
 = 
  
 TRUE 
 ) 
 AS 
 SELECT 
  
 * 
  
 EXCEPT 
  
 ( 
 feature_timestamp 
 , 
  
 entity_id 
 ) 
 FROM 
  
 ML 
 . 
 FEATURES_AT_TIME 
 ( 
  
 TABLE 
  
 ` 
 mydataset 
 . 
 feature_table 
 ` 
 , 
  
 time 
  
 => 
  
 '2023-01-01 12:00:00+00' 
 , 
  
 num_rows 
  
 => 
  
 1 
 , 
  
 ignore_feature_nulls 
  
 => 
  
 TRUE 
 ); 

Example 2

This example shows how to get predictions from a model based on features that were created or updated before 2023-01-01 12:00:00+00 :

 SELECT 
  
 * 
 FROM 
  
 ML 
 . 
 PREDICT 
 ( 
  
 MODEL 
  
 ` 
 mydataset 
 . 
 mymodel 
 ` 
 , 
  
 ( 
  
 SELECT 
  
 * 
  
 EXCEPT 
  
 ( 
 feature_timestamp 
 , 
  
 entity_id 
 ) 
  
 FROM 
  
 ML 
 . 
 FEATURES_AT_TIME 
 ( 
  
 TABLE 
  
 ` 
 mydataset 
 . 
 feature_table 
 ` 
 , 
  
 time 
  
 => 
  
 '2023-01-01 12:00:00+00' 
 , 
  
 num_rows 
  
 => 
  
 1 
 , 
  
 ignore_feature_nulls 
  
 => 
  
 TRUE 
 ) 
  
 ) 
  
 ); 

Example 3

This is a contrived example that you can use to see the output of the function:

 WITH 
  
 feature_table 
  
 AS 
  
 ( 
  
 SELECT 
  
 * 
  
 FROM 
  
 UNNEST 
 ( 
  
 ARRAY<STRUCT<entity_id 
  
 STRING 
 , 
  
 f_1 
  
 FLOAT64 
 , 
  
 f_2 
  
 FLOAT64 
 , 
  
 feature_timestamp 
  
 TIMESTAMP 
 >> 
 [ 
  
 ( 
 'id1' 
 , 
  
 1 
 . 
 0 
 , 
  
 1 
 . 
 0 
 , 
  
 TIMESTAMP 
  
 '2022-06-10 12:00:00+00' 
 ), 
  
 ( 
 'id2' 
 , 
  
 12 
 . 
 0 
 , 
  
 24 
 . 
 0 
 , 
  
 TIMESTAMP 
  
 '2022-06-11 12:00:00+00' 
 ), 
  
 ( 
 'id1' 
 , 
  
 11 
 . 
 0 
 , 
  
 NULL 
 , 
  
 TIMESTAMP 
  
 '2022-06-11 12:00:00+00' 
 ), 
  
 ( 
 'id1' 
 , 
  
 6 
 . 
 0 
 , 
  
 12 
 . 
 0 
 , 
  
 TIMESTAMP 
  
 '2022-06-11 10:00:00+00' 
 ), 
  
 ( 
 'id2' 
 , 
  
 2 
 . 
 0 
 , 
  
 4 
 . 
 0 
 , 
  
 TIMESTAMP 
  
 '2022-06-10 12:00:00+00' 
 ), 
  
 ( 
 'id2' 
 , 
  
 7 
 . 
 0 
 , 
  
 NULL 
 , 
  
 TIMESTAMP 
  
 '2022-06-11 10:00:00+00' 
 )]) 
  
 ) 
 SELECT 
  
 * 
 FROM 
  
 ML 
 . 
 FEATURES_AT_TIME 
 ( 
  
 TABLE 
  
 feature_table 
 , 
  
 time 
  
 => 
  
 '2022-06-12 10:00:00+00' 
 , 
  
 num_rows 
  
 => 
  
 1 
 , 
  
 ignore_feature_nulls 
  
 => 
  
 TRUE 
 ); 
Create a Mobile Website
View Site in Mobile | Classic
Share by: