Run a hybrid vector similarity search

This document describes how to perform hybrid searches in AlloyDB for PostgreSQL, which combines text search, keyword matching, vector search, and semantic similarity, using the vector extension, which is a standard pgvector PostgreSQL extension customized for AlloyDB. Hybrid search lets you retrieve highly relevant results by using both exact keyword matches and semantically similar content.

Hybrid search overcomes the limitations of using a single search method. Semantic search alone might miss relevant documents that contain the exact keywords but which have a slightly different semantic context. Conversely, a keyword-only search can't understand synonyms, misspellings, or conceptual relationships, which can lead to missed opportunities for relevant results. By combining both search methods, hybrid search provides more relevant and comprehensive results for a better user experience.

To perform a hybrid search in AlloyDB for PostgreSQL, you'll need to create both a vector index and a text search index on your table. Then, you'll combine the results from both searches and re-rank them to present the most relevant information.

Create a GIN index

A Generalized Inverted Index (GIN) index is a specialized index type optimized for searching within composite values, such as arrays, JSONB, and full-text search data.

To create a GIN index on your text data to perform a full text search, run the following:

  CREATE 
  
 INDEX 
  
  INDEX_NAME 
 
  
 ON 
  
  TABLE 
 
  
 USING 
  
 GIN 
  
 ( 
 to_tsvector 
 ( 
 'english' 
 , 
  
  COLUMN_NAME 
 
 )); 
 

Replace the following:

  • INDEX_NAME : the name of the index you want to create —for example, my-gin-index .

  • TABLE : the table to add the index to.

  • COLUMN_NAME : the column that stores the text data you want to search.

Create a ScaNN index

To apply a two-level tree index using the ScaNN algorithm to a column containing stored vector embeddings, run the following DDL query:

  CREATE 
  
 INDEX 
  
  INDEX_NAME 
 
  
 ON 
  
  TABLE 
 
  
 USING 
  
 scann 
  
 ( 
  EMBEDDING_COLUMN 
 
  
  DISTANCE_FUNCTION 
 
 ) 
  
 WITH 
  
 ( 
 num_leaves 
 = 
  NUM_LEAVES_VALUE 
 
 ); 
 

Replace the following:

  • INDEX_NAME : the name of the index you want to create—for example, my-scann-index . The index names are shared across your database. Ensure that each index name is unique to each table in your database.

  • TABLE : the table to add the index to.

  • EMBEDDING_COLUMN : a column that stores vector data.

  • DISTANCE_FUNCTION : the distance function to use with this index. Choose one of the following:

    • L2 distance: l2

    • Dot product: dot_product

    • Cosine distance: cosine

  • NUM_LEAVES_VALUE : the number of partitions to apply to this index. Set to any value between 1 to 1048576. For more information about how to decide this value, see Tune a ScaNN index .

To learn more about different ScaNN index configurations, see Create a ScaNN index . You can also create a HNSW index.

Perform a hybrid search using reciprocal rank fusion

Hybrid search involves performing separate vector and text searches, then combining and re-ranking results using Reciprocal Rank Fusion (RRF). RRF is a rank-based algorithm that combines multiple ranked lists of search results into a single ranked list by assigning a score to each document. This score is based on RRF's reciprocal rank across all contributing lists, with higher-ranked documents receiving a greater contribution. Use the following SQL query to combine full text search and hybrid search, and re-rank the results:

   
 WITH 
  
 vector_search 
  
 AS 
  
 ( 
  
 SELECT 
  
 id 
 , 
  
 RANK 
  
 () 
  
 OVER 
  
 ( 
 ORDER 
  
 BY 
  
 embedding 
  
< = 
>  
 google_ml 
 . 
 embedding 
 ( 
 ' MODEL_ID 
' 
 , 
  
 ' TEXT 
' 
 )) 
  
 AS 
  
 rank 
  
 FROM 
  
  TABLE 
 
  
 ORDER 
  
 BY 
  
 embedding 
  
< = 
>  
 google_ml 
 . 
 embedding 
 ( 
 ' MODEL_ID 
' 
 , 
  
 ' TEXT 
' 
 ) 
  
 LIMIT 
  
 10 
  
 ), 
  
 text_search 
  
 AS 
  
 ( 
  
 SELECT 
  
 id 
 , 
  
 RANK 
  
 () 
  
 OVER 
  
 ( 
 ORDER 
  
 BY 
  
 ts_rank 
 ( 
 to_tsvector 
 ( 
 'english' 
 , 
  
  COLUMN_NAME 
 
 ), 
  
 to_tsquery 
 ( 
  KEYWORD 
 
 )) 
  
 desc 
 ) 
  
 FROM 
  
  TABLE 
 
  
 WHERE 
  
 to_tsvector 
 ( 
 'english' 
 , 
  
  COLUMN_NAME 
 
 ) 
  
 @@ 
  
 to_tsquery 
 ( 
  KEYWORD 
 
 ) 
  
 ORDER 
  
 BY 
  
 ts_rank 
 ( 
 to_tsvector 
 ( 
 'english' 
 , 
  
  COLUMN_NAME 
 
 ), 
  
 to_tsquery 
 ( 
  KEYWORD 
 
 )) 
  
 desc 
  
 LIMIT 
  
 10 
  
 ) 
  
 SELECT 
  
 COALESCE 
 ( 
 vector_search 
 . 
 id 
 , 
  
 text_search 
 . 
 id 
 ) 
  
 AS 
  
 id 
 , 
  
 COALESCE 
 ( 
 1 
 . 
 0 
  
 / 
  
 ( 
 60 
  
 + 
  
 vector_search 
 . 
 rank 
 ), 
  
 0 
 . 
 0 
 ) 
  
 + 
  
 COALESCE 
 ( 
 1 
 . 
 0 
  
 / 
  
 ( 
 60 
  
 + 
  
 text_search 
 . 
 rank 
 ), 
  
 0 
 . 
 0 
 ) 
  
 AS 
  
 rrf_score 
  
 FROM 
  
 vector_search 
  
 FULL 
  
 OUTER 
  
 JOIN 
  
 text_search 
  
 ON 
  
 vector_search 
 . 
 id 
  
 = 
  
 text_search 
 . 
 id 
  
 ORDER 
  
 BY 
  
 rrf_score 
  
 DESC 
  
 LIMIT 
  
 5 
 ; 
 

Replace the following:

  • MODEL_ID : the ID of the model to query.

    If you are using the Vertex AI Model Garden, then specify text-embedding-005 as the model ID. These are the cloud-based models that AlloyDB can use for text embeddings. For more information, see Text embeddings .

  • TABLE : the table containing your data.

  • TEXT : the text to translate into a vector embedding.

  • KEYWORD : the keyword you want to search for.

  • COLUMN_NAME : a column that stores contains the text data you want to search.

Explanation of the Hybrid Search Query and related Common Table Expression (CTE):

  • vector_search CTE: Performs a standard vector similarity search, ordering results by cosine distance and assigning a rank. It retrieves the top 10 most semantically similar products.
  • text_search CTE: Executes a text search using to_tsvector and to_tsquery , calculating relevance with ts_rank and retrieving the top 10 most relevant text matches.
  • Final SELECT Statement CTE: Joins vector and text search results using a FULL OUTER JOIN , selects the product ID, calculates the RRF score, orders by score, and retrieves the top 5 results.

Perform a hybrid search using LangChain

Hybrid search with the AlloyDB vector store enhances search accuracy by combining two different lookup strategies: dense embedding vector search and keyword-based search. AlloyDBVectorStore is a LangChain vector store class that uses LangChain by acting as a specific implementation of LangChain's VectorStore class. Learn how to use AlloyDB to store vector embeddings with the AlloyDBVectorStore class .

You can enable and configure this hybrid search using the HybridSearchConfig class when you set up your AlloyDBVectorStore .

Hybrid search with the AlloyDB vector store simultaneously performs a semantic search to understand the meaning and context of a query, and a keyword search to find exact lexical matches. The results from both searches are then merged to provide a more comprehensive set of results.

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: