The ML.HASH_BUCKETIZE function
This document describes the ML.HASH_BUCKETIZE
function, which lets you
convert a string expression to a deterministic hash and then bucketize it by the
modulo value of that hash.
You can use this function with models that support manual feature preprocessing . For more information, see the following documents:
Syntax
ML.HASH_BUCKETIZE(string_expression, hash_bucket_size)
Arguments
ML.HASH_BUCKETIZE
takes the following arguments:
-
string_expression: theSTRINGexpression to bucketize. -
hash_bucket_size: anINT64value that specifies the number of buckets to create. This value must be greater than or equal to0. Ifhash_bucket_sizeequals0, the function only hashes the string without bucketizing the hashed value.
Output
ML.HASH_BUCKETIZE
returns an INT64
value that identifies the bucket.
Example
The following example bucketizes string expressions into three buckets:
SELECT f , ML . HASH_BUCKETIZE ( f , 3 ) AS bucket FROM UNNEST ([ 'a' , 'b' , 'c' , 'd' ]) AS f ;
The output looks similar to the following:
+---+--------+ | f | bucket | +---+--------+ | a | 0 | +---+--------+ | b | 1 | +---+--------+ | c | 1 | +---+--------+ | d | 2 | +------------+
What's next
- For information about feature preprocessing, see Feature preprocessing overview .

