Spanner search indexes
can accelerate pattern matching expressions such as LIKE
, STARTS_WITH
, ENDS_WITH
,
and regular expression matching predicate REGEXP_CONTAINS
.
This page describes how to create and configure a search index using TOKENIZE_NGRAMS
to accelerate pattern matching predicates.
Configure an n-gram TOKENLIST
for pattern-matching acceleration
To enable pattern-matching expressions acceleration, tokenize a lower-cased STRING
column with TOKENIZE_NGRAMS
and store the STRING
column using the STORING
clause in
GoogleSQL, or INCLUDE
clause in PostgreSQL.
GoogleSQL
CREATE
TABLE
Albums
(
AlbumId
INT64
NOT
NULL
,
AlbumTitle
STRING
(
MAX
),
AlbumTitle_Ngram_Tokens
TOKENLIST
AS
(
TOKENIZE_NGRAMS
(
LOWER
(
AlbumTitle
),
ngram_size_min
=
> 3
,
ngram_size_max
=
> 4
))
HIDDEN
,
)
PRIMARY
KEY
(
AlbumId
);
CREATE
SEARCH
INDEX
AlbumsIndex
ON
Albums
(
AlbumTitle_Ngram_Tokens
)
STORING
(
AlbumTitle
);
PostgreSQL
CREATE
TABLE
albums
(
albumid
bigint
NOT
NULL
,
album_title
varchar
,
album_title_ngrams_tokens
spanner
.
tokenlist
GENERATED
ALWAYS
AS
(
spanner
.
tokenize_ngrams
(
lower
(
album_title
),
ngram_size_min
=
>
3
,
ngram_size_max
=
>
4
)
)
VIRTUAL
HIDDEN
,
PRIMARY
KEY
(
albumid
));
CREATE
SEARCH
INDEX
albumsidx
ON
albums
(
album_title_ngrams_tokens
)
INCLUDE
(
album_title
);
Automatic acceleration of queries with pattern-matching predicates
The query optimizer might choose to accelerate the following queries using AlbumsIndex
with AlbumTitle_Ngram_Tokens
. Optionally, the query can provide @{force_index = AlbumsIndex}
to force the optimizer to use AlbumsIndex
.
GoogleSQL
In GoogleSQL, we accelerate LIKE
, STARTS_WITH
, ENDS_WITH
,
and REGEXP_CONTAINS
.
-
LIKEpredicate:SELECT AlbumId FROM Albums @ { FORCE_INDEX = AlbumsIndex } WHERE AlbumTitle LIKE "%999%" ; -
STARTS_WITHpredicate:SELECT AlbumId FROM Albums @ { FORCE_INDEX = AlbumsIndex } WHERE STARTS_WITH ( AlbumTitle , "apple" ) -
ENDS_WITHpredicate:SELECT AlbumId FROM Albums @ { FORCE_INDEX = AlbumsIndex } WHERE ENDS_WITH ( AlbumTitle , "apple" ) -
REGEXP_CONTAINSpredicate:SELECT AlbumId FROM Albums @ { FORCE_INDEX = AlbumsIndex } WHERE REGEXP_CONTAINS ( AlbumTitle , r "(good|great)[ ]+morning" )
PostgreSQL
In PostgreSQL, we accelerate LIKE
and STARTS_WITH
.
-
LIKEpredicate:SELECT albumid FROM albums /*@ FORCE_INDEX = albumsidx */ WHERE album_title like '%999%' ; -
STARTS_WITHpredicate:SELECT albumid FROM albums /*@ FORCE_INDEX = albumsidx */ WHERE starts_with ( album_title , 'apple' )
Prerequisites on acceleration
For Spanner to enable this acceleration, the following rules must be met:
- The index must store the
STRINGcolumn using theSTORINGclause in GoogleSQL, orINCLUDEclause in PostgreSQL. This prevents costly back-joins to the base table during post-filtering, which is critical for performance when the search over-retrieves documents. - The
STRINGcolumn must be tokenized usingTOKENIZE_NGRAMS. - The tokenization must apply to
LOWER(column_name)rather thancolumn_name. - The
LIKEpattern,STARTS_WITHprefix,ENDS_WITHsuffix, orREGEXP_CONTAINSregular expression must be specified as a constant literal. Query parameters are not supported to avoid acceleration on patterns that are too short. - The
LIKEpattern,STARTS_WITHprefix,ENDS_WITHsuffix, orREGEXP_CONTAINSregular expression must contain enough text for at least one n-gram. For exampler".*"doesn't qualify because there's no sequence of characters to match. Similarly, if the ngram minimum size is set to 3, theLIKEpredicate"%ab%"doesn't qualify because"ab"(size 2) is too short.
What's next
- Learn about finding approximate matches with fuzzy search .

