The ML.LABEL_ENCODER function
This document describes the ML.LABEL_ENCODER
function, which you can use to
encode a string expression to an INT64
value in [0, <number of categories>]
.
The encoding vocabulary is sorted alphabetically. NULL
values and categories
that aren't in the vocabulary are encoded to 0
.
When used in the TRANSFORM
clause
,
the vocabulary values calculated during training, along
with the top k
and frequency threshold values that you specified, are
automatically used in prediction.
You can use this function with models that support manual feature preprocessing . For more information, see the following documents:
Syntax
ML.LABEL_ENCODER(string_expression [, top_k] [, frequency_threshold]) OVER()
ML.LABEL_ENCODER
takes the following arguments:
-
string_expression: theSTRINGexpression to encode. -
top_k: anINT64value that specifies the number of categories included in the encoding vocabulary. The function selects thetop_kmost frequent categories in the data and uses those; categories below this threshold are encoded to0. This value must be less than1,000,000to avoid problems due to high dimensionality. The default value is32,000. -
frequency_threshold: anINT64value that limits the categories included in the encoding vocabulary based on category frequency. The function uses categories whose frequency is greater than or equal tofrequency_threshold; categories below this threshold are encoded to0. The default value is5.
Output
ML.LABEL_ENCODER
returns an INT64
value that represents the encoded
string expression.
Example
The following example performs label encoding on a set of string expressions. It limits the encoding vocabulary to the two categories that occur the most frequently in the data and that also occur two or more times.
SELECT f , ML . LABEL_ENCODER ( f , 2 , 2 ) OVER () AS output FROM UNNEST ([ NULL , 'a' , 'b' , 'b' , 'c' , 'c' , 'c' , 'd' , 'd' ]) AS f ORDER BY f ;
The output looks similar to the following:
+------+--------+ | f | output | +------+--------+ | NULL | 0 | | a | 0 | | b | 1 | | b | 1 | | c | 2 | | c | 2 | | c | 2 | | d | 0 | | d | 0 | +------+--------+
What's next
- For information about feature preprocessing, see Feature preprocessing overview .
- For information about the supported SQL statements and functions for each

