In addition to full token matching, Spanner search indexes support substring searches. This page describes how to perform a substring search as part of a full-text search in Spanner.
Substring searches have the following characteristics:
- Case insensitive, discards most punctuation, and normalizes whitespace.
- No Chinese, Japanese, Korean (CJK) segmentation, since partial CJK queries often segment incorrectly.
- For multiple search terms, the result must contain a substring from each
term. For example,
'happ momen'
matches"happy moment"
, because both substrings are found in the text. It doesn't match"happy day"
.
Examples
Stored text | Substring query | Match |
---|---|---|
Bridge over Troubled Water
|
ridg roub | Yes |
Bridge over Troubled Water
|
ridg , roub | Yes |
Bridge over Troubled Water
|
over brid | Yes |
Bridge over Troubled Water
|
ate bridge | Yes |
Bridge over Troubled Water
|
Bridge bridge bridge | Yes |
Bridge over Troubled Water
|
bri trou ter | Yes |
Bridge over Troubled Water
|
bri dge | Yes |
Bridge over Troubled Water
|
troubledwater | No |
Bridge over Troubled Water
|
trubled | No |
For a substring search, use the TOKENIZE_SUBSTRING
function in the TOKENLIST
column definition, as shown in the following DDL
example:
GoogleSQL
CREATE
TABLE
Albums
(
AlbumId
STRING
(
MAX
)
NOT
NULL
,
AlbumTitle
STRING
(
MAX
),
AlbumTitle_Tokens
TOKENLIST
AS
(
TOKENIZE_SUBSTRING
(
AlbumTitle
))
HIDDEN
)
PRIMARY
KEY
(
AlbumId
);
PostgreSQL
This example uses spanner.tokenize_substring
.
CREATE
TABLE
albums
(
albumid
character
varying
NOT
NULL
,
albumtitle
character
varying
,
albumtitle_tokens
spanner
.
tokenlist
GENERATED
ALWAYS
AS
(
spanner
.
tokenize_substring
(
albumtitle
))
VIRTUAL
HIDDEN
,
PRIMARY
KEY
(
albumid
));
In the SQL query, use the SEARCH_SUBSTRING
function in the WHERE
clause. For
example, the following query matches an album with title "happy" from the table
created in the previous example:
GoogleSQL
SELECT
Album
FROM
Albums
WHERE
SEARCH_SUBSTRING
(
AlbumTitle_Tokens
,
'happ'
);
PostgreSQL
This example uses spanner.search_substring
.
SELECT
album
FROM
albums
WHERE
spanner
.
search_substring
(
albumtitle_tokens
,
'happ'
);
TOKENIZE_SUBSTRING
generates n-grams
for each token and stores these n-grams in the search index. The minimum and
maximum length of n-grams to generate are configured through optional arguments.
Substring search indexes can use 10-30x more storage as full-text indexes over
the same data, because the tokenization produces a lot more tokens. This is
especially true if as the difference between ngram_size_min
and ngram_size_max
grows. Substring queries also use more resources to execute.
Like TOKENIZE_FULLTEXT
,
you can configure TOKENIZE_SUBSTRING
to use specific types of content.
Enable a relative substring search
In addition to the basic substring search, SEARCH_SUBSTRING
supports the relative search mode. A relative search refines substring search
results.
To enable the relative search mode, set the relative_search_types
parameter of TOKENIZE_SUBSTRING
to a non-empty array with elements of supported relative search types.
When relative search is enabled in tokenization, SEARCH_SUBSTRING
can perform
queries with the following relative search types:
-
phrase
: matches contiguous substringsExamples
Stored text Substring query. Match Bridge over Troubled Waterbridge over Yes Bridge over Troubled WaterBridge bridge bridge No Bridge over Troubled Waterbrid over No Bridge over Troubled Waterridge over trouble Yes Bridge over Troubled Waterbridge ove troubled No Bridge over Troubled Wateridge ove Yes Bridge over Troubled Wateridge , ove Yes Bridge over Troubled WaterRIDGE OVE Yes Bridge over Troubled Waterbridge water No -
value_prefix
: matches contiguous substrings and the match has to start at the beginning of the value. This is conceptually similar to theSTARTS_WITH
function for case and whitespace normalized strings.Examples
Stored text Substring query Match Bridge over Troubled Waterbridge over Yes Bridge over Troubled Waterbridge , over Yes Bridge over Troubled Waterridge over No Bridge over Troubled Watertroubled water No -
value_suffix
: matches contiguous substrings and the match has to match at the end of the value. This is conceptually similar to theENDS_WITH
function for case and whitespace normalized strings.Examples
Stored text Substring query. Match Bridge over Troubled Watertroubled water Yes Bridge over Troubled Watertroubled ; water Yes Bridge over Troubled Waterroubled water Yes Bridge over Troubled Watertroubled wate No Bridge over Troubled Watertrouble water No Bridge over Troubled Waterbridge over No -
word_prefix:
likevalue_prefix
, but the string has to match at a term boundary (rather than a value boundary).Examples
Stored text Substring query Match Bridge over Troubled Waterover trouble Yes Bridge over Troubled WaterOver , trouble Yes Bridge over Troubled Watertroub water No Bridge over Troubled Waterover water No Bridge over Troubled Waterove troubled No Bridge over Troubled Waterver troubled Yes -
word_suffix
: likevalue_suffix
, but the string has to match at the end of a term boundary.Examples
Stored text Substring query Match Bridge over Troubled Waterver troubled Yes Bridge over Troubled Waterover trouble No Bridge over Troubled Waterover water No Bridge over Troubled Waterove troubled No
What's next
- Learn about full-text search queries .
- Learn how to rank search results .
- Learn how to paginate search results .
- Learn how to mix full-text and non-text queries .
- Learn how to search multiple columns .