Perform a substring search

In addition to full token matching, Spanner search indexes support substring searches. This page describes how to perform a substring search as part of a full-text search in Spanner.

Substring searches have the following characteristics:

  • Case insensitive, discards most punctuation, and normalizes whitespace.
  • No Chinese, Japanese, Korean (CJK) segmentation, since partial CJK queries often segment incorrectly.
  • For multiple search terms, the result must contain a substring from each term. For example, 'happ momen' matches "happy moment" , because both substrings are found in the text. It doesn't match "happy day" .

Examples

Stored text Substring query Match
Bridge over Troubled Water
ridg roub Yes
Bridge over Troubled Water
ridg , roub Yes
Bridge over Troubled Water
over brid Yes
Bridge over Troubled Water
ate bridge Yes
Bridge over Troubled Water
Bridge bridge bridge Yes
Bridge over Troubled Water
bri trou ter Yes
Bridge over Troubled Water
bri dge Yes
Bridge over Troubled Water
troubledwater No
Bridge over Troubled Water
trubled No

For a substring search, use the TOKENIZE_SUBSTRING function in the TOKENLIST column definition, as shown in the following DDL example:

GoogleSQL

  CREATE 
  
 TABLE 
  
 Albums 
  
 ( 
 AlbumId 
  
 STRING 
 ( 
 MAX 
 ) 
  
 NOT 
  
 NULL 
 , 
 AlbumTitle 
  
 STRING 
 ( 
 MAX 
 ), 
 AlbumTitle_Tokens 
  
 TOKENLIST 
  
 AS 
  
 ( 
 TOKENIZE_SUBSTRING 
 ( 
 AlbumTitle 
 )) 
  
 HIDDEN 
 ) 
  
 PRIMARY 
  
 KEY 
 ( 
 AlbumId 
 ); 
 

PostgreSQL

This example uses spanner.tokenize_substring .

  CREATE 
  
 TABLE 
  
 albums 
  
 ( 
 albumid 
  
 character 
  
 varying 
  
 NOT 
  
 NULL 
 , 
 albumtitle 
  
 character 
  
 varying 
 , 
 albumtitle_tokens 
  
 spanner 
 . 
 tokenlist 
  
 GENERATED 
  
 ALWAYS 
  
 AS 
  
 ( 
 spanner 
 . 
 tokenize_substring 
 ( 
 albumtitle 
 )) 
  
 VIRTUAL 
  
 HIDDEN 
 , 
 PRIMARY 
  
 KEY 
 ( 
 albumid 
 )); 
 

In the SQL query, use the SEARCH_SUBSTRING function in the WHERE clause. For example, the following query matches an album with title "happy" from the table created in the previous example:

GoogleSQL

  SELECT 
  
 Album 
 FROM 
  
 Albums 
 WHERE 
  
 SEARCH_SUBSTRING 
 ( 
 AlbumTitle_Tokens 
 , 
  
 'happ' 
 ); 
 

PostgreSQL

This example uses spanner.search_substring .

  SELECT 
  
 album 
 FROM 
  
 albums 
 WHERE 
  
 spanner 
 . 
 search_substring 
 ( 
 albumtitle_tokens 
 , 
  
 'happ' 
 ); 
 

TOKENIZE_SUBSTRING generates n-grams for each token and stores these n-grams in the search index. The minimum and maximum length of n-grams to generate are configured through optional arguments.

Substring search indexes can use 10-30x more storage as full-text indexes over the same data, because the tokenization produces a lot more tokens. This is especially true if as the difference between ngram_size_min and ngram_size_max grows. Substring queries also use more resources to execute.

Like TOKENIZE_FULLTEXT , you can configure TOKENIZE_SUBSTRING to use specific types of content.

In addition to the basic substring search, SEARCH_SUBSTRING supports the relative search mode. A relative search refines substring search results.

To enable the relative search mode, set the relative_search_types parameter of TOKENIZE_SUBSTRING to a non-empty array with elements of supported relative search types.

When relative search is enabled in tokenization, SEARCH_SUBSTRING can perform queries with the following relative search types:

  • phrase : matches contiguous substrings

    Examples

    Stored text Substring query. Match
    Bridge over Troubled Water
    bridge over Yes
    Bridge over Troubled Water
    Bridge bridge bridge No
    Bridge over Troubled Water
    brid over No
    Bridge over Troubled Water
    ridge over trouble Yes
    Bridge over Troubled Water
    bridge ove troubled No
    Bridge over Troubled Water
    idge ove Yes
    Bridge over Troubled Water
    idge , ove Yes
    Bridge over Troubled Water
    RIDGE OVE Yes
    Bridge over Troubled Water
    bridge water No
  • value_prefix : matches contiguous substrings and the match has to start at the beginning of the value. This is conceptually similar to the STARTS_WITH function for case and whitespace normalized strings.

    Examples

    Stored text Substring query Match
    Bridge over Troubled Water
    bridge over Yes
    Bridge over Troubled Water
    bridge , over Yes
    Bridge over Troubled Water
    ridge over No
    Bridge over Troubled Water
    troubled water No
  • value_suffix : matches contiguous substrings and the match has to match at the end of the value. This is conceptually similar to the ENDS_WITH function for case and whitespace normalized strings.

    Examples

    Stored text Substring query. Match
    Bridge over Troubled Water
    troubled water Yes
    Bridge over Troubled Water
    troubled ; water Yes
    Bridge over Troubled Water
    roubled water Yes
    Bridge over Troubled Water
    troubled wate No
    Bridge over Troubled Water
    trouble water No
    Bridge over Troubled Water
    bridge over No
  • word_prefix: like value_prefix , but the string has to match at a term boundary (rather than a value boundary).

    Examples

    Stored text Substring query Match
    Bridge over Troubled Water
    over trouble Yes
    Bridge over Troubled Water
    Over , trouble Yes
    Bridge over Troubled Water
    troub water No
    Bridge over Troubled Water
    over water No
    Bridge over Troubled Water
    ove troubled No
    Bridge over Troubled Water
    ver troubled Yes
  • word_suffix : like value_suffix , but the string has to match at the end of a term boundary.

    Examples

    Stored text Substring query Match
    Bridge over Troubled Water
    ver troubled Yes
    Bridge over Troubled Water
    over trouble No
    Bridge over Troubled Water
    over water No
    Bridge over Troubled Water
    ove troubled No

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: