Accelerate pattern-matching expressions

Spanner search indexes can accelerate pattern matching expressions such as LIKE , STARTS_WITH , ENDS_WITH , and regular expression matching predicate REGEXP_CONTAINS . This page describes how to create and configure a search index using TOKENIZE_NGRAMS to accelerate pattern matching predicates.

Configure an n-gram `TOKENLIST` for pattern-matching acceleration

To enable pattern-matching expressions acceleration, tokenize a lower-cased STRING column with TOKENIZE_NGRAMS and store the STRING column using the STORING clause in GoogleSQL, or INCLUDE clause in PostgreSQL.

GoogleSQL

  CREATE 
  
 TABLE 
  
 Albums 
  
 ( 
 AlbumId 
  
 INT64 
  
 NOT 
  
 NULL 
 , 
 AlbumTitle 
  
 STRING 
 ( 
 MAX 
 ), 
 AlbumTitle_Ngram_Tokens 
  
 TOKENLIST 
  
 AS 
  
 ( 
  
 TOKENIZE_NGRAMS 
 ( 
 LOWER 
 ( 
 AlbumTitle 
 ), 
  
 ngram_size_min 
 = 
> 3 
 , 
  
 ngram_size_max 
 = 
> 4 
 )) 
  
 HIDDEN 
 , 
 ) 
  
 PRIMARY 
  
 KEY 
 ( 
 AlbumId 
 ); 
 CREATE 
  
 SEARCH 
  
 INDEX 
  
 AlbumsIndex 
 ON 
  
 Albums 
 ( 
 AlbumTitle_Ngram_Tokens 
 ) 
  
 STORING 
  
 ( 
 AlbumTitle 
 );

PostgreSQL

  CREATE 
  
 TABLE 
  
 albums 
  
 ( 
 albumid 
  
 bigint 
  
 NOT 
  
 NULL 
 , 
 album_title 
  
 varchar 
 , 
 album_title_ngrams_tokens 
  
 spanner 
 . 
 tokenlist 
  
 GENERATED 
  
 ALWAYS 
  
 AS 
  
 ( 
  
 spanner 
 . 
 tokenize_ngrams 
 ( 
  
 lower 
 ( 
 album_title 
 ), 
  
 ngram_size_min 
  
 = 
>  
 3 
 , 
  
 ngram_size_max 
  
 = 
>  
 4 
  
 ) 
 ) 
  
 VIRTUAL 
  
 HIDDEN 
 , 
 PRIMARY 
  
 KEY 
 ( 
 albumid 
 )); 
 CREATE 
  
 SEARCH 
  
 INDEX 
  
 albumsidx 
  
 ON 
 albums 
 ( 
 album_title_ngrams_tokens 
 ) 
  
 INCLUDE 
  
 ( 
 album_title 
 );

Automatic acceleration of queries with pattern-matching predicates

The query optimizer might choose to accelerate the following queries using AlbumsIndex with AlbumTitle_Ngram_Tokens . Optionally, the query can provide @{force_index = AlbumsIndex} to force the optimizer to use AlbumsIndex .

GoogleSQL

In GoogleSQL, we accelerate LIKE , STARTS_WITH , ENDS_WITH , and REGEXP_CONTAINS .

LIKE predicate:

  SELECT 
  
 AlbumId 
 FROM 
  
 Albums 
  
 @ 
 { 
 FORCE_INDEX 
 = 
 AlbumsIndex 
 } 
 WHERE 
  
 AlbumTitle 
  
 LIKE 
  
 "%999%" 
 ;

STARTS_WITH predicate:

  SELECT 
  
 AlbumId 
 FROM 
  
 Albums 
  
 @ 
 { 
 FORCE_INDEX 
 = 
 AlbumsIndex 
 } 
 WHERE 
  
 STARTS_WITH 
 ( 
 AlbumTitle 
 , 
  
 "apple" 
 )

ENDS_WITH predicate:

  SELECT 
  
 AlbumId 
 FROM 
  
 Albums 
  
 @ 
 { 
 FORCE_INDEX 
 = 
 AlbumsIndex 
 } 
 WHERE 
  
 ENDS_WITH 
 ( 
 AlbumTitle 
 , 
  
 "apple" 
 )

REGEXP_CONTAINS predicate:

  SELECT 
  
 AlbumId 
 FROM 
  
 Albums 
  
 @ 
 { 
 FORCE_INDEX 
 = 
 AlbumsIndex 
 } 
 WHERE 
  
 REGEXP_CONTAINS 
 ( 
 AlbumTitle 
 , 
  
 r 
 "(good|great)[ ]+morning" 
 )

PostgreSQL

In PostgreSQL, we accelerate LIKE and STARTS_WITH .

LIKE predicate:

  SELECT 
  
 albumid 
 FROM 
  
 albums 
  
 /*@ FORCE_INDEX = albumsidx */ 
 WHERE 
  
 album_title 
  
 like 
  
 '%999%' 
 ;

STARTS_WITH predicate:

  SELECT 
  
 albumid 
 FROM 
  
 albums 
  
 /*@ FORCE_INDEX = albumsidx */ 
 WHERE 
  
 starts_with 
 ( 
 album_title 
 , 
  
 'apple' 
 )

Prerequisites on acceleration

For Spanner to enable this acceleration, the following rules must be met:

The index must store the STRING column using the STORING clause in GoogleSQL, or INCLUDE clause in PostgreSQL. This prevents costly back-joins to the base table during post-filtering, which is critical for performance when the search over-retrieves documents.
The STRING column must be tokenized using TOKENIZE_NGRAMS .
The tokenization must apply to LOWER(column_name) rather than column_name .
The LIKE pattern, STARTS_WITH prefix, ENDS_WITH suffix, or REGEXP_CONTAINS regular expression must be specified as a constant literal. Query parameters are not supported to avoid acceleration on patterns that are too short.
The LIKE pattern, STARTS_WITH prefix, ENDS_WITH suffix, or REGEXP_CONTAINS regular expression must contain enough text for at least one n-gram. For example r".*" doesn't qualify because there's no sequence of characters to match. Similarly, if the ngram minimum size is set to 3, the LIKE predicate "%ab%" doesn't qualify because "ab" (size 2) is too short.

What's next

Learn about finding approximate matches with fuzzy search .

Accelerate pattern-matching expressions Stay organized with collections Save and categorize content based on your preferences.

Configure an n-gram TOKENLIST for pattern-matching acceleration

GoogleSQL

PostgreSQL

Automatic acceleration of queries with pattern-matching predicates

GoogleSQL

PostgreSQL

Prerequisites on acceleration

What's next

Accelerate pattern-matching expressions

Configure an n-gram `TOKENLIST` for pattern-matching acceleration