Stay organized with collectionsSave and categorize content based on your preferences.
The ML.HASH_BUCKETIZE function
This document describes theML.HASH_BUCKETIZEfunction, which lets you
convert a string expression to a deterministic hash and then bucketize it by the
modulo value of that hash.
You can use this function with models that supportmanual feature preprocessing. For more
information, see the following documents:
string_expression: theSTRINGexpression to bucketize.
hash_bucket_size: anINT64value that specifies the number of buckets to
create. This value must be greater than or equal to0. Ifhash_bucket_sizeequals0, the function only hashes the string without
bucketizing the hashed value.
Output
ML.HASH_BUCKETIZEreturns anINT64value that identifies the bucket.
Example
The following example bucketizes string expressions into three buckets:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003e\u003ccode\u003eML.HASH_BUCKETIZE\u003c/code\u003e converts a string expression into a deterministic hash value.\u003c/p\u003e\n"],["\u003cp\u003eThe function then bucketizes this hash based on the modulo of the provided \u003ccode\u003ehash_bucket_size\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003e\u003ccode\u003ehash_bucket_size\u003c/code\u003e determines the number of buckets and must be an \u003ccode\u003eINT64\u003c/code\u003e greater than or equal to 0.\u003c/p\u003e\n"],["\u003cp\u003eWhen \u003ccode\u003ehash_bucket_size\u003c/code\u003e is 0, the function will hash the string but not bucketize the hashed value.\u003c/p\u003e\n"],["\u003cp\u003eThe function returns an \u003ccode\u003eINT64\u003c/code\u003e representing the assigned bucket for the input string.\u003c/p\u003e\n"]]],[],null,["# The ML.HASH_BUCKETIZE function\n==============================\n\nThis document describes the `ML.HASH_BUCKETIZE` function, which lets you\nconvert a string expression to a deterministic hash and then bucketize it by the\nmodulo value of that hash.\n\nSyntax\n------\n\n```sql\nML.HASH_BUCKETIZE(string_expression, hash_bucket_size)\n```\n\n### Arguments\n\n`ML.HASH_BUCKETIZE` takes the following arguments:\n\n- `string_expression`: the `STRING` expression to bucketize.\n- `hash_bucket_size`: an `INT64` value that specifies the number of buckets to create. This value must be greater than or equal to `0`. If `hash_bucket_size` equals `0`, the function only hashes the string without bucketizing the hashed value.\n\nOutput\n------\n\n`ML.HASH_BUCKETIZE` returns an `INT64` value that identifies the bucket.\n\nExample\n-------\n\nThe following example bucketizes string expressions into three buckets: \n\n```sql\nSELECT\n f, ML.HASH_BUCKETIZE(f, 3) AS bucket\nFROM UNNEST(['a', 'b', 'c', 'd']) AS f;\n```\n\nThe output looks similar to the following: \n\n```\n+---+--------+\n| f | bucket |\n+---+--------+\n| a | 0 |\n+---+--------+\n| b | 1 |\n+---+--------+\n| c | 1 |\n+---+--------+\n| d | 2 |\n+------------+\n```\n\nWhat's next\n-----------\n\n- For information about feature preprocessing, see [Feature preprocessing overview](/bigquery/docs/preprocess-overview).\n- For information about the supported SQL statements and functions for each model type, see [End-to-end user journey for each model](/bigquery/docs/e2e-journey)."]]