Create and manage RUM index

This document shows you how to create the RUM extension and create indexes to optimize full-text search in AlloyDB for PostgreSQL. It provides examples for common use cases, including ranking, phrase searching, and sorting by timestamp.

Before you begin

To create the RUM extension, you must have the alloydb_superuser role. For more information about granting roles, see Add an IAM user or service account to a cluster .

Create the RUM extension

You must create the RUM extension once per database.

  1. Connect to your AlloyDB database using psql or another client. For more information, see Connect to a cluster instance .
  2. Run the following SQL command to create the extension:

      CREATE 
      
     EXTENSION 
      
     IF 
      
     NOT 
      
     EXISTS 
      
     rum 
     ; 
     
    

Create a RUM index

To optimize full-text search queries, create a RUM index on your data. RUM offers several operator classes for different use cases.

Types of RUM operator classes

The following table summarizes the different RUM operator classes and their primary use cases.

Operator Class Main Use Case Limitations
Standard full-text search with ranking and phrase search. N/A
Smaller index and faster updates for full-text search. Does not support prefix searching.
Full-text search sorted by another column. N/A
Searching within array columns. N/A
Indexing scalar types for distance-based queries. N/A
Hash-based full-text search sorted by another column. Does not support prefix matching.
Indexing stored tsquery values for reverse search. N/A
Array search sorted by another column. N/A

Use the rum_tsvector_ops operator class for standard text search that requires fast ranking and phrase search capabilities. This operator class stores the position of each lexeme in the index. The following example creates a table named documents with a content column.

  1. Create a table named documents :

      CREATE 
      
     TABLE 
      
     documents 
      
     ( 
      
     id 
      
     SERIAL 
      
     PRIMARY 
      
     KEY 
     , 
      
     title 
      
     TEXT 
      
     NOT 
      
     NULL 
     , 
      
     content 
      
     TEXT 
      
     NOT 
      
     NULL 
     , 
      
     published_at 
      
     TIMESTAMP 
      
     WITH 
      
     TIME 
      
     ZONE 
      
     DEFAULT 
      
     NOW 
     () 
     ); 
     
    
  2. Populate the documents table with sample data:

      INSERT 
      
     INTO 
      
     documents 
      
     ( 
     title 
     , 
      
     content 
     ) 
      
     VALUES 
      
     ( 
     'Title' 
     , 
      
     'This search engine is working as intended' 
     ); 
     
    
  3. Add a generated tsvector column to your table. This column automatically stores the processed text and improves query performance:

      ALTER 
      
     TABLE 
      
     documents 
     ADD 
      
     COLUMN 
      
     search_vector 
      
     tsvector 
     GENERATED 
      
     ALWAYS 
      
     AS 
      
     ( 
     to_tsvector 
     ( 
     'english' 
     , 
      
     content 
     )) 
      
     STORED 
     ; 
     
    
  4. Create the RUM index on the new search_vector column:

      CREATE 
      
     INDEX 
      
     idx_docs_rum 
     ON 
      
     documents 
     USING 
      
     rum 
      
     ( 
     search_vector 
      
     rum_tsvector_ops 
     ); 
     
    
  5. Query the table using the index. The <=> operator computes the relevance score, or distance, between the document and the query directly from the index, enabling fast sorting:

      SELECT 
      
     title 
     , 
      
     content 
     FROM 
      
     documents 
     WHERE 
      
     search_vector 
      
     @@ 
      
     to_tsquery 
     ( 
     'english' 
     , 
      
     'search <-> engine' 
     ) 
     ORDER 
      
     BY 
      
     search_vector 
      
    < = 
    >  
     to_tsquery 
     ( 
     'english' 
     , 
      
     'search <-> engine' 
     ); 
     
    
  6. Populate documents table with more data:

      INSERT 
      
     INTO 
      
     documents 
      
     ( 
     title 
     , 
      
     content 
     ) 
      
     VALUES 
      
     ( 
     'Title1' 
     , 
      
     'English is my primary language.' 
     ); 
     INSERT 
      
     INTO 
      
     documents 
      
     ( 
     title 
     , 
      
     content 
     ) 
      
     VALUES 
      
     ( 
     'Title2' 
     , 
      
     'Google has a great engineering culture' 
     ); 
     
    
  7. Run a prefix search query. This finds documents containing words that start with eng , such as engineer or english :

      SELECT 
      
     title 
     , 
      
     content 
     FROM 
      
     documents 
     WHERE 
      
     search_vector 
      
     @@ 
      
     to_tsquery 
     ( 
     'english' 
     , 
      
     'eng:*' 
     ); 
     
    

Use the rum_tsvector_hash_ops operator class to reduce index size and improve update speeds. This class stores a hash of each lexeme instead of the full lexeme. This approach results in a smaller index but does not support prefix searching. The following example assumes that you have a table named documents with a search_vector column.

  1. Create the RUM index using the hash operator class:

      CREATE 
      
     INDEX 
      
     idx_docs_rum_hash 
     ON 
      
     documents 
     USING 
      
     rum 
      
     ( 
     search_vector 
      
     rum_tsvector_hash_ops 
     ); 
     
    
  2. Populate documents table with more data:

      INSERT 
      
     INTO 
      
     documents 
      
     ( 
     title 
     , 
      
     content 
     ) 
      
     VALUES 
      
     ( 
     'Title3' 
     , 
      
     'That person was driving incredibly fast, however the routing was not very efficient' 
     ); 
     
    
  3. Run a standard match query:

      SELECT 
      
     * 
      
     FROM 
      
     documents 
      
     WHERE 
      
     search_vector 
      
     @@ 
      
     to_tsquery 
     ( 
     'english' 
     , 
      
     'fast & efficient' 
     ); 
     
    

Index for search sorted by timestamp

Use the rum_tsvector_addon_ops operator class to optimize queries that filter by text and sort by another field, such as a timestamp. This pattern stores the additional field's value directly in the index, which avoids a slow sort operation after the search. The following example assumes that you have a table named documents with a search_vector column and a published_at column.

  1. Create an index that includes the published_at timestamp:

      CREATE 
      
     INDEX 
      
     idx_docs_rum_timestamp 
     ON 
      
     documents 
     USING 
      
     rum 
      
     ( 
     search_vector 
      
     rum_tsvector_addon_ops 
     , 
      
     published_at 
     ) 
     WITH 
      
     ( 
     attach 
      
     = 
      
     'published_at' 
     , 
      
     to 
      
     = 
      
     'search_vector' 
     ); 
     
    
  2. Run a query that finds documents containing the word engine and sorts them by publication date. The index handles both the search and the sort efficiently:

      SELECT 
      
     title 
     , 
      
     published_at 
     FROM 
      
     documents 
     WHERE 
      
     search_vector 
      
     @@ 
      
     to_tsquery 
     ( 
     'english' 
     , 
      
     'engine' 
     ) 
     ORDER 
      
     BY 
      
     published_at 
      
     DESC 
     ; 
     
    

Use the rum_anyarray_ops operator class to index array columns, such as a list of tags. This lets you efficiently query for arrays that overlap ( && ), contain ( @> ), or are contained by ( <@ ) other arrays. The following example adds a tags column to the documents table.

  1. Add a tags column and populate it with data:

      ALTER 
      
     TABLE 
      
     documents 
      
     ADD 
      
     COLUMN 
      
     tags 
      
     TEXT 
     []; 
     INSERT 
      
     INTO 
      
     documents 
      
     ( 
     title 
     , 
      
     content 
     , 
      
     tags 
     ) 
      
     VALUES 
      
     ( 
      
     'Title4' 
     , 
      
     'Sample Text' 
     , 
      
     ARRAY 
     [ 
     'ai' 
     , 
      
     'ml' 
     ] 
      
     ); 
     
    
  2. Create the RUM index on a TEXT[] column named tags :

      CREATE 
      
     INDEX 
      
     idx_tags_rum 
     ON 
      
     documents 
     USING 
      
     rum 
      
     ( 
     tags 
      
     rum_anyarray_ops 
     ); 
     
    
  3. Run a query to find documents that have either ai or ml in their tags:

      SELECT 
      
     * 
      
     FROM 
      
     documents 
      
     WHERE 
      
     tags 
     && 
     '{"ai", "ml"}' 
     ; 
     
    

Index for scalar types

Use the rum_<TYPE>_ops operator classes to index columns that contain continuous values, such as integers, timestamps, or floating-point numbers. These operator classes let you use the <=> operator to efficiently calculate the distance between values. The following example assumes that you have a table named documents .

  1. Add a generic integer column, such as rating , to the documents table:

      ALTER 
      
     TABLE 
      
     documents 
     ADD 
      
     COLUMN 
      
     rating 
      
     INT 
     ; 
     UPDATE 
      
     documents 
      
     SET 
      
     rating 
      
     = 
      
     floor 
     ( 
     random 
     () 
      
     * 
      
     5 
      
     + 
      
     1 
     ); 
     
    
  2. Create a RUM index on the rating column:

      CREATE 
      
     INDEX 
      
     idx_rating_rum 
     ON 
      
     documents 
     USING 
      
     rum 
      
     ( 
     rating 
      
     rum_int4_ops 
     ); 
     
    
  3. Run a query to find documents with a rating closest to the value 5:

      SELECT 
      
     title 
     , 
      
     rating 
     FROM 
      
     documents 
     ORDER 
      
     BY 
      
     rating 
      
    < = 
    >  
     5 
     ; 
     
    

Index for optimized hash search sorted by timestamp

Use the rum_tsvector_hash_addon_ops operator class to combine the benefits of a hash index with the sorting capabilities of an addon index. This class stores a hash of each lexeme along with the value of an additional column. This configuration supports efficient sorting by the additional column but does not support prefix matching. The following example assumes that you have a table named documents with a search_vector column and a published_at timestamp column.

  1. Create a RUM index that uses the hash operator class and includes the published_at timestamp:

      CREATE 
      
     INDEX 
      
     idx_docs_rum_hash_timestamp 
     ON 
      
     documents 
     USING 
      
     rum 
      
     ( 
     search_vector 
      
     rum_tsvector_hash_addon_ops 
     , 
      
     published_at 
     ) 
     WITH 
      
     ( 
     attach 
      
     = 
      
     'published_at' 
     , 
      
     to 
      
     = 
      
     'search_vector' 
     ); 
     
    
  2. Run a query that finds documents containing engine and sorts them by publication date:

      SELECT 
      
     title 
     , 
      
     published_at 
     FROM 
      
     documents 
     WHERE 
      
     search_vector 
      
     @@ 
      
     to_tsquery 
     ( 
     'english' 
     , 
      
     'engine' 
     ) 
     ORDER 
      
     BY 
      
     published_at 
      
     DESC 
     ; 
     
    

Index for stored queries

Use the rum_tsquery_ops operator class to index tsquery values. This lets you perform "reverse search," identifying which stored queries match a given input document. The following example creates a table named queries .

  1. Create a table to store queries:

      CREATE 
      
     TABLE 
      
     queries 
      
     ( 
     query_text 
      
     tsquery 
     ); 
     INSERT 
      
     INTO 
      
     queries 
      
     ( 
     query_text 
     ) 
      
     VALUES 
      
     ( 
     plainto_tsquery 
     ( 
     'AlloyDB is fast!' 
     )); 
     
    
  2. Create a RUM index on the query_text column:

      CREATE 
      
     INDEX 
      
     idx_queries_rum 
     ON 
      
     queries 
     USING 
      
     rum 
      
     ( 
     query_text 
      
     rum_tsquery_ops 
     ); 
     
    
  3. Run a query to find stored queries that match a document:

      SELECT 
      
     * 
     FROM 
      
     queries 
     WHERE 
      
     to_tsvector 
     ( 
     'english' 
     , 
      
     'AlloyDB is fast' 
     ) 
      
     @@ 
      
     query_text 
     ; 
     
    

Index for array search sorted by timestamp

Use the rum_anyarray_addon_ops operator class to index array columns along with an additional column for sorting. The following example assumes that you have a table named documents with a tags column and a published_at timestamp column.

  1. Create a RUM index on the tags column that includes the published_at timestamp:

      CREATE 
      
     INDEX 
      
     idx_tags_rum_timestamp 
     ON 
      
     documents 
     USING 
      
     rum 
      
     ( 
     tags 
      
     rum_anyarray_addon_ops 
     , 
      
     published_at 
     ) 
     WITH 
      
     ( 
     attach 
      
     = 
      
     'published_at' 
     , 
      
     to 
      
     = 
      
     'tags' 
     ); 
     
    
  2. Run a query to find documents with the ai tag, sorted by publication date:

      SELECT 
      
     title 
     , 
      
     published_at 
     FROM 
      
     documents 
     WHERE 
      
     tags 
      
     @ 
    >  
     '{"ai"}' 
     ORDER 
      
     BY 
      
     published_at 
      
     DESC 
     ; 
     
    

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: