The ML.NGRAMS function
This document describes the ML.NGRAMS
function, which lets you create n-grams
of the input values.
You can use this function with models that support manual feature preprocessing . For more information, see the following documents:
Syntax
ML.NGRAMS(array_input, range [, separator])
Arguments
ML.NGRAMS
takes the following arguments:
-
array_input: anARRAY<STRING>value that represent the tokens to be merged. -
range: anARRAYof twoINT64elements or a singleINT64value. If you specify anARRAYvalue, theINT64elements provide the range of n-gram sizes to return. Provide the numerical values in order, lower to higher. If you specify a singleINT64value of x , the range of n-gram sizes to return is[x, x]. -
separator: aSTRINGvalue that specifies the separator to connect two adjacent tokens in the output. The default value is whitespace.
Output
ML.NGRAMS
returns an ARRAY<STRING>
value that contain the n-grams.
Example
The following example outputs all possible 2-token and 3-token combinations for a set of three input strings:
SELECT ML . NGRAMS ([ 'a' , 'b' , 'c' ], [ 2 , 3 ], '#' ) AS output ;
The output looks similar to the following:
+-----------------------+ | output | +-----------------------+ | ["a#b","a#b#c","b#c"] | +-----------------------+
What's next
- For information about feature preprocessing, see Feature preprocessing overview .

