The ML.NGRAMS function

This document describes the ML.NGRAMS function, which lets you create n-grams of the input values.

You can use this function with models that support manual feature preprocessing . For more information, see the following documents:

Syntax

ML.NGRAMS(array_input, range [, separator])

ML.NGRAMS takes the following arguments:

array_input : an ARRAY<STRING> value that represent the tokens to be merged.
range : an ARRAY of two INT64 elements or a single INT64 value. If you specify an ARRAY value, the INT64 elements provide the range of n-gram sizes to return. Provide the numerical values in order, lower to higher. If you specify a single INT64 value of x , the range of n-gram sizes to return is [x, x] .
separator : a STRING value that specifies the separator to connect two adjacent tokens in the output. The default value is whitespace .

ML.NGRAMS returns an ARRAY<STRING> value that contain the n-grams.

The following example outputs all possible 2-token and 3-token combinations for a set of three input strings:

 SELECT 
  
 ML 
 . 
 NGRAMS 
 ([ 
 'a' 
 , 
  
 'b' 
 , 
  
 'c' 
 ], 
  
 [ 
 2 
 , 
 3 
 ], 
  
 '#' 
 ) 
  
 AS 
  
 output 
 ;

The output looks similar to the following:

+-----------------------+
|        output         |
+-----------------------+
| ["a#b","a#b#c","b#c"] |
+-----------------------+

For information about feature preprocessing, see Feature preprocessing overview .