The ML.QUANTILE_BUCKETIZE function
This document describes the ML.QUANTILE_BUCKETIZE
function, which lets you
break a continuous numerical feature into buckets based on quantiles.
When used in the TRANSFORM
clause
,
the same quantiles are automatically used in prediction.
You can use this function with models that support manual feature preprocessing . For more information, see the following documents:
Syntax
ML.QUANTILE_BUCKETIZE(numerical_expression, num_buckets [, output_format]) OVER()
Arguments
ML.QUANTILE_BUCKETIZE
takes the following arguments:
-
numerical_expression: the numerical expression to bucketize. -
num_buckets: anINT64value that specifies the number of buckets to splitnumerical_expressioninto. -
output_format: aSTRINGvalue that specifies the output format of the bucket. Valid output formats are as follows:-
bucket_names: returns aSTRINGvalue in the formatbin_<bucket_index>. For example,bin_3. Thebucket_indexvalue starts at 1. This is the default bucket format. -
bucket_ranges: returns aSTRINGvalue in the format[lower_bound, upper_bound)in interval notation . For example,(-inf, 2.5),[2.5, 4.6),[4.6, +inf). -
bucket_ranges_json: returns a JSON-formattedSTRINGvalue in the format{"start": "lower_bound", "end": "upper_bound"}. For example,{"start": "-Infinity", "end": "2.5"},{"start": "2.5", "end": "4.6"},{"start": "4.6", "end": "Infinity"}. The inclusivity and exclusivity of the lower and upper bound follow the same pattern as thebucket_rangesoption.
-
Output
ML.QUANTILE_BUCKETIZE
returns a STRING
value that contains the name of the bucket, in the format specified by the output_format
argument.
Example
The following example breaks a numerical expression of five elements into three buckets:
SELECT f , ML . QUANTILE_BUCKETIZE ( f , 3 ) OVER () AS bucket , ML . QUANTILE_BUCKETIZE ( f , 3 , "bucket_ranges" ) OVER () AS bucket_ranges , ML . QUANTILE_BUCKETIZE ( f , 3 , "bucket_ranges_json" ) OVER () AS bucket_ranges_json FROM UNNEST ([ 1 , 2 , 3 , 4 , 5 ]) AS f ORDER BY f ;
The output looks similar to the following:
+---+--------+---------------+------------------------------------+
| f | bucket | bucket_ranges | bucket_ranges_json |
|---|--------|---------------|------------------------------------|
| 1 | bin_1 | (-inf, 2) | {"start": "-Infinity", "end": "2"} |
| 2 | bin_2 | [2, 4) | {"start": "2", "end": "4"} |
| 3 | bin_2 | [2, 4) | {"start": "2", "end": "4"} |
| 4 | bin_3 | [4, +inf) | {"start": "4", "end": "Infinity"} |
| 5 | bin_3 | [4, +inf) | {"start": "4", "end": "Infinity"} |
+---+--------+---------------+------------------------------------+
What's next
- For information about feature preprocessing, see Feature preprocessing overview .

