The ML.HASH_BUCKETIZE function

This document describes the ML.HASH_BUCKETIZE function, which lets you convert a string expression to a deterministic hash and then bucketize it by the modulo value of that hash.

You can use this function with models that support manual feature preprocessing . For more information, see the following documents:

Syntax

ML.HASH_BUCKETIZE(string_expression, hash_bucket_size)

Arguments

ML.HASH_BUCKETIZE takes the following arguments:

string_expression : the STRING expression to bucketize.
hash_bucket_size : an INT64 value that specifies the number of buckets to create. This value must be greater than or equal to 0 . If hash_bucket_size equals 0 , the function only hashes the string without bucketizing the hashed value.

Output

ML.HASH_BUCKETIZE returns an INT64 value that identifies the bucket.

Example

The following example bucketizes string expressions into three buckets:

 SELECT 
  
 f 
 , 
  
 ML 
 . 
 HASH_BUCKETIZE 
 ( 
 f 
 , 
  
 3 
 ) 
  
 AS 
  
 bucket 
 FROM 
  
 UNNEST 
 ([ 
 'a' 
 , 
  
 'b' 
 , 
  
 'c' 
 , 
  
 'd' 
 ]) 
  
 AS 
  
 f 
 ;

The output looks similar to the following:

+---+--------+
| f | bucket |
+---+--------+
| a |   0    |
+---+--------+
| b |   1    |
+---+--------+
| c |   1    |
+---+--------+
| d |   2    |
+------------+

What's next

For information about feature preprocessing, see Feature preprocessing overview .