The ML.BUCKETIZE function
This document describes the ML.BUCKETIZE
function, which lets you split
a numerical expression into buckets.
You can use this function with models that support manual feature preprocessing . For more information, see the following documents:
Syntax
ML.BUCKETIZE(numerical_expression, array_split_points [, exclude_boundaries] [, output_format])
Arguments
ML.BUCKETIZE
takes the following arguments:
-
numerical_expression: the numerical expression to bucketize. -
array_split_points: an array of numerical values that provide the points at which to split thenumerical_expressionvalue. The numerical values in the array must be finite, so not-inf,inf, orNaN. Provide the numerical values in order, lowest to highest. The range of possible buckets is determined by the upper and lower boundaries of the array. For example, if thearray_split_pointsvalue is[1, 2, 3, 4], then there are five potential buckets that thenumerical_expressionvalue can be bucketized into. -
exclude_boundaries: aBOOLvalue that determines whether the upper and lower boundaries fromarray_split_pointsare used. IfTRUE, then the boundary values aren't used to create buckets. For example, if thearray_split_pointsvalue is[1, 2, 3, 4]andexclude_boundariesisTRUE, then there are three potential buckets that thenumerical_expressionvalue can be bucketized into. The default value isFALSE. -
output_format: aSTRINGvalue that specifies the output format of the bucket. Valid output formats are as follows:-
bucket_names: returns aSTRINGvalue in the formatbin_<bucket_index>. For example,bin_3. Thebucket_indexvalue starts at 1. This is the default bucket format. -
bucket_ranges: returns aSTRINGvalue in the format[lower_bound, upper_bound)in interval notation . For example,(-inf, 2.5),[2.5, 4.6),[4.6, +inf). -
bucket_ranges_json: returns a JSON-formattedSTRINGvalue in the format{"start": "lower_bound", "end": "upper_bound"}. For example,{"start": "-Infinity", "end": "2.5"},{"start": "2.5", "end": "4.6"},{"start": "4.6", "end": "Infinity"}. The inclusivity and exclusivity of the lower and upper bound follow the same pattern as thebucket_rangesoption.
-
Output
ML.BUCKETIZE
returns a STRING
value that contains the name of the bucket, in the format specified by the output_format
argument.
Example
The following example bucketizes a numerical expression both with and without boundaries:
SELECT ML . BUCKETIZE ( 2 . 5 , [ 1 , 2 , 3 ]) AS bucket , ML . BUCKETIZE ( 2 . 5 , [ 1 , 2 , 3 ], TRUE ) AS bucket_without_boundaries , ML . BUCKETIZE ( 2 . 5 , [ 1 , 2 , 3 ], FALSE , "bucket_ranges" ) AS bucket_ranges , ML . BUCKETIZE ( 2 . 5 , [ 1 , 2 , 3 ], FALSE , "bucket_ranges_json" ) AS bucket_ranges_json ;
The output looks similar to the following:
+--------+---------------------------+---------------+----------------------------+
| bucket | bucket_without_boundaries | bucket_ranges | bucket_ranges_json |
|--------|---------------------------|---------------|----------------------------|
| bin_3 | bin_2 | [2, 3) | {"start": "2", "end": "3"} |
+--------+---------------------------+---------------+----------------------------+
What's next
- For information about feature preprocessing, see Feature preprocessing overview .

