Search query expansion

This page describes how to use query enhancement as part of a full-text search .

To increase the likelihood of finding relevant results, Spanner offers advanced capabilities that expand the search query to include related terms, synonyms and spelling corrections. Spanner provides the following options for query enhancement:

Enhanced query

To use enhanced query, set enhance_query=>true in the SEARCH function. Spanner then automatically expands search queries by including related terms and synonyms, applying stemming, and making spelling corrections. For example, when using enhanced query, the search query hotl cal matches the album Hotel California .

GoogleSQL

  SELECT 
  
 AlbumId 
 FROM 
  
 Albums 
 WHERE 
  
 SEARCH 
 ( 
 AlbumTitle_Tokens 
 , 
  
 'hotl cal' 
 , 
  
 enhance_query 
 = 
> true 
 ) 
 

PostgreSQL

  SELECT 
  
 albumid 
 FROM 
  
 albums 
 WHERE 
  
 spanner 
 . 
 search 
 ( 
 albumtitle_tokens 
 , 
  
 'hotl cal' 
 , 
  
 enhance_query 
 = 
> true 
 ) 
 

enhance_query is a query-time option that doesn't affect tokenization. You can use the same search index with or without enhance_query .

Google is continuously improving the query enhancement algorithms. As a result, a query with enhance_query => true might yield slightly different results over time.

When enhance_query is used, it might increase latency due to the overhead of expanding the search query and executing the resulting larger query.

Custom dictionaries

You can use custom dictionaries with Spanner full-text search to define synonyms for terms in your dataset. This is useful for capturing synonyms, acronyms, equivalent terms, and other word variations to improve the retrieval of search results.

Create a custom dictionary table

A custom dictionary is a user-created table containing key-value pairs of terms and their synonyms. To create one, include the fulltext_dictionary_table = true option in the CREATE TABLE statement. The table must have two columns:

  • Key : A non nullable string column for the term to be expanded with synonyms.
  • Value : A non nullable array of strings column for an array of synonyms for the key.

There cannot be any columns other than Key and Value in the table. If a search term matches a Key in the dictionary, the search is expanded to include all corresponding synonyms in Value along with the key term itself. The following example creates a custom dictionary table named MyCustomDictionary . You must also set table option fulltext_dictionary_table=true on this table during creation.

GoogleSQL

  CREATE 
  
 TABLE 
  
 MyCustomDictionary 
  
 ( 
  
 Key 
  
 STRING 
 ( 
 MAX 
 ) 
  
 NOT 
  
 NULL 
 , 
  
 Value 
  
 ARRAY<STRING 
 ( 
 MAX 
 ) 
>  
 NOT 
  
 NULL 
 , 
 ) 
  
 PRIMARY 
  
 KEY 
 ( 
 Key 
 ), 
 OPTIONS 
  
 ( 
 fulltext_dictionary_table 
  
 = 
  
 true 
 ); 
 

PostgreSQL

  CREATE 
  
 TABLE 
  
 mycustomdictionary 
  
 ( 
  
 key 
  
 character 
  
 varying 
  
 NOT 
  
 NULL 
 , 
  
 value 
  
 character 
  
 varying 
  
 [] 
  
 NOT 
  
 NULL 
 , 
  
 PRIMARY 
  
 KEY 
 ( 
 key 
 ) 
 ) 
  
 WITH 
  
 ( 
  
 type 
  
 = 
  
 'fulltext_dictionary' 
 ) 
 

After creating the table, insert your synonyms:

GoogleSQL

  INSERT 
  
 INTO 
  
 MyCustomDictionary 
  
 ( 
 Key 
 , 
  
 Value 
 ) 
  
 VALUES 
 ( 
 'album' 
 , 
  
 [ 
 'vinyl' 
 , 
  
 'cassette' 
 ]), 
 ( 
 'edm' 
 , 
  
 [ 
 'electronic dance music' 
 ]); 
 

PostgreSQL

  INSERT 
  
 INTO 
  
 mycustomdictionary 
  
 ( 
 key 
 , 
  
 value 
 ) 
  
 VALUES 
 ( 
 'album' 
 , 
  
 ARRAY 
 [ 
 'vinyl' 
 , 
  
 'cassette' 
 ]), 
 ( 
 'edm' 
 , 
  
 ARRAY 
 [ 
 'electronic dance music' 
 ]); 
 

When populating the table, keep the following in mind:

  • Keys must be single words.
  • Values can be single words or multi-word phrases. If a value contains multiple words, it is treated as a phrase search during query expansion.
  • All keys and values should be in lowercase. This ensures they match search tokens, which are converted to lowercase by the default tokenizer.

Search expansion only occurs when a search term matches a Key . If a search term matches a synonym in Value but not a Key , the search is not expanded. If you require bidirectional mapping (for example, so that searching for vinyl or cassette also finds album ), you must also insert reverse mappings into the dictionary table. The following example shows how to insert bidirectional mappings for album , vinyl , and cassette :

GoogleSQL

  INSERT 
  
 INTO 
  
 MyCustomDictionary 
  
 ( 
 Key 
 , 
  
 Value 
 ) 
 -- 1. Insert album -> vinyl, cassette 
 SELECT 
  
 'album' 
 , 
  
 [ 
 'vinyl' 
 , 
  
 'cassette' 
 ] 
 UNION 
  
 ALL 
 -- 2. Insert vinyl -> album and cassette -> album 
 SELECT 
  
 syn 
 , 
  
 [ 
 'album' 
 ] 
 FROM 
  
 UNNEST 
 ([ 
 'vinyl' 
 , 
  
 'cassette' 
 ]) 
  
 AS 
  
 syn 
 ; 
 

PostgreSQL

  INSERT 
  
 INTO 
  
 mycustomdictionary 
  
 ( 
 key 
 , 
  
 value 
 ) 
 -- 1. Insert album -> vinyl, cassette 
 SELECT 
  
 'album' 
 , 
  
 ARRAY 
 [ 
 'vinyl' 
 , 
  
 'cassette' 
 ] 
 UNION 
  
 ALL 
 -- 2. Insert vinyl -> album and cassette -> album 
 SELECT 
  
 syn 
 , 
  
 ARRAY 
 [ 
 'album' 
 ] 
 FROM 
  
 unnest 
 ( 
 ARRAY 
 [ 
 'vinyl' 
 , 
  
 'cassette' 
 ]) 
  
 AS 
  
 syn 
 ; 
 

Use a custom dictionary in search queries

To use a custom dictionary, specify the dictionary table name in the dictionary argument of the SEARCH function.

  • If a search term is a key in the dictionary, SEARCH also looks for its values. The term album matches messages containing vinyl or cassette .

    GoogleSQL

      SELECT 
      
     MessageId 
     , 
      
     Body 
     FROM 
      
     Messages 
     WHERE 
      
     SEARCH 
     ( 
     Body_Tokens 
     , 
      
     'album' 
     , 
      
     dictionary 
     = 
    > 'MyCustomDictionary' 
     ); 
     
    

    PostgreSQL

      SELECT 
      
     messageid 
     , 
      
     body 
     FROM 
      
     messages 
     WHERE 
      
     spanner 
     . 
     search 
     ( 
     body_tokens 
     , 
      
     'album' 
     , 
      
     dictionary 
     = 
    > 'mycustomdictionary' 
     ); 
     
    
  • When a dictionary value contains multiple words such as electronic dance music for the key edm , it is treated as a phrase search. This means the terms must appear adjacently and in the exact order specified to be considered a match. For example, the following query returns results containing the exact phrase "electronic dance music", but does not match "dance electronic music". This is primarily useful for expanding acronyms and can be used in the following way:

    GoogleSQL

      SELECT 
      
     MessageId 
     , 
      
     Body 
     FROM 
      
     Messages 
     WHERE 
      
     SEARCH 
     ( 
     Body_Tokens 
     , 
      
     'edm' 
     , 
      
     dictionary 
     = 
    > 'MyCustomDictionary' 
     ); 
     
    

    PostgreSQL

      SELECT 
      
     messageid 
     , 
      
     body 
     FROM 
      
     messages 
     WHERE 
      
     spanner 
     . 
     search 
     ( 
     body_tokens 
     , 
      
     'edm' 
     , 
      
     dictionary 
     = 
    > 'mycustomdictionary' 
     ); 
     
    

Dictionary lookup staleness

By default, custom dictionary entries are read with a maximum staleness of 15 seconds. This reduces the overhead of the read operation because reading data with some staleness is more efficient than reading the most current data. Consequently, changes to dictionary entries might take up to 15 seconds to be reflected in search queries. You can override this behavior in the following ways:

  • Using the fulltext_dictionary_staleness table option.
  • Using the fulltext_dictionary_staleness query hint for more granular control.

The query hint overrides the table option if both are used.

fulltext_dictionary_staleness table option

You can set the fulltext_dictionary_staleness option for the dictionary table. All queries using this dictionary table use this staleness value, unless overridden by the query hint.

GoogleSQL

The following example shows how to CREATE a table with the fulltext_dictionary_staleness option:

  CREATE 
  
 TABLE 
  
 MyCustomDictionary 
  
 ( 
  
 Key 
  
 STRING 
 ( 
 MAX 
 ) 
  
 NOT 
  
 NULL 
 , 
  
 Value 
  
 ARRAY<STRING 
 ( 
 MAX 
 ) 
>  
 NOT 
  
 NULL 
 , 
 ) 
  
 PRIMARY 
  
 KEY 
 ( 
 Key 
 ), 
 OPTIONS 
  
 ( 
  
 fulltext_dictionary_table 
  
 = 
  
 true 
 , 
  
 fulltext_dictionary_staleness 
  
 = 
  
 '5s' 
 ); 
 

The following example shows how to ALTER table to change or set the fulltext_dictionary_staleness option:

  ALTER 
  
 TABLE 
  
 MyCustomDictionary 
  
 SET 
  
 OPTIONS 
  
 ( 
  
 fulltext_dictionary_staleness 
  
 = 
  
 '60s' 
 ); 
 

PostgreSQL

The following example shows how to CREATE a table with the fulltext_dictionary_staleness option:

  -- Create with 5s staleness 
 CREATE 
  
 TABLE 
  
 mycustomdictionary 
  
 ( 
  
 key 
  
 character 
  
 varying 
  
 NOT 
  
 NULL 
 , 
  
 value 
  
 character 
  
 varying 
 [] 
  
 NOT 
  
 NULL 
 , 
  
 PRIMARY 
  
 KEY 
 ( 
 key 
 ) 
 ) 
  
 WITH 
  
 ( 
  
 type 
  
 = 
  
 'fulltext_dictionary' 
 , 
  
 fulltext_dictionary_staleness 
  
 = 
  
 '5s' 
 ); 
 

The PostgreSQL interface doesn't support ALTER table to change or set the fulltext_dictionary_staleness option.

fulltext_dictionary_staleness query hint

For more granular control, you can use the fulltext_dictionary_staleness query hint to specify a different staleness for an individual query. This hint overrides the table-level setting.

The following example uses a hint to perform dictionary table lookups with zero staleness. This ensures that the most current dictionary entries are read. This approach can increase query latency because reading the most current data is less efficient than allowing for some staleness.

GoogleSQL

  @ 
 { 
 fulltext_dictionary_staleness 
 = 
 "0s" 
 } 
 SELECT 
  
 MessageId 
 , 
  
 Body 
 FROM 
  
 Messages 
 WHERE 
  
 SEARCH 
 ( 
 Body_Tokens 
 , 
  
 'Bill' 
 , 
  
 dictionary 
 = 
> 'MyCustomDictionary' 
 ); 
 

PostgreSQL

  /*@ fulltext_dictionary_staleness='0s' */ 
 SELECT 
  
 messageid 
 , 
  
 body 
 FROM 
  
 messages 
 WHERE 
  
 spanner 
 . 
 search 
 ( 
 body_tokens 
 , 
  
 'Bill' 
 , 
  
 dictionary 
 = 
> 'mycustomdictionary' 
 ); 
 

Known limitations with custom dictionary tables

  • There is limited support for using Spanner Import and Export with custom dictionary tables.
  • Custom dictionary tables must be created in the default schema, and cannot be created in named schemas .
  • Custom dictionary tables only support the default SEARCH query dialect .

Combining custom dictionaries with enhanced query

You can combine custom dictionary synonyms with enhanced query by setting both dictionary and enhance_query=>true in the SEARCH function. Query enhancement can expand queries with common synonyms or spelling corrections, while custom dictionaries allow you to define your own expansions. For example, if enhance_query expands album to include record , and MyCustomDictionary maps album to ['vinyl', 'cassette'] , the following query matches messages containing album , record , vinyl , or cassette :

GoogleSQL

  SELECT 
  
 MessageId 
 , 
  
 Body 
 FROM 
  
 Messages 
 WHERE 
  
 SEARCH 
 ( 
 Body_Tokens 
 , 
  
 'album' 
 , 
  
 enhance_query 
 = 
> true 
 , 
  
 dictionary 
 = 
> 'MyCustomDictionary' 
 ); 
 

PostgreSQL

  SELECT 
  
 messageid 
 , 
  
 body 
 FROM 
  
 messages 
 WHERE 
  
 spanner 
 . 
 search 
 ( 
 body_tokens 
 , 
  
 'album' 
 , 
  
 enhance_query 
 = 
> true 
 , 
  
 dictionary 
 = 
> 'mycustomdictionary' 
 ); 
 

When both enhancements are enabled, they operate independently on the original search terms, and terms generated by one enhancement aren't used as input for the other.

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: