Asynchronous secondary index queries

This document provides examples of common query patterns for building asynchronous secondary indexes in Bigtable. The examples use IDs and values that are similar to those in Data for examples and you can create a test table to test the queries.

Before you read this page, familiarize yourself with Create an asynchronous secondary index and GoogleSQL for Bigtable .

Example queries

The following examples show how to create asynchronous secondary indexes using different GoogleSQL queries.

Create an inverted index based on a column qualifier and values

The following query creates an inverted index by extracting all key-value pairs from a column family that has a flexible schema. It uses MAP_ENTRIES to get an array of all data that the stats_summary column family stores, and then UNNEST to transform each key-value pair into a separate row. The resulting index allows for fast lookups based on stat_description and stat_value . If you add new types of statistics to the source table, then Bigtable includes them in the asynchronous index automatically.

  SELECT 
  
 stats 
 . 
 key 
  
 as 
  
 stat_description 
 , 
  
 stats 
 . 
 value 
  
 as 
  
 stat_value 
 , 
  
 _key 
  
 as 
  
 original_key 
 , 
  
 cell_plan 
  
 as 
  
 cell_plan 
 FROM 
  
 test_table 
 CROSS 
  
 JOIN 
  
 UNNEST 
 ( 
 MAP_ENTRIES 
 ( 
 stats_summary 
 )) 
  
 stats 
 ORDER 
  
 BY 
  
 stat_description 
 , 
  
 stat_value 
 , 
  
 original_key

Create a new row key based on existing data

The following example creates a new index key, total_plan_capacity , by categorizing data plans that the cell_plan column family stores. The query uses MAP_KEYS to get all keys in cell_plan and ARRAY_INCLUDES_ANY to filter for specific data plan keys. A CASE statement then defines the logic to assign a total_plan_capacity category that's based on the presence and values of these data plans. This allows for efficient lookups based on the combined data plan capacity.

  SELECT 
  
 CASE 
  
 WHEN 
  
 cell_plan 
 [ 
 'data_plan_01gb' 
 ] 
  
 = 
  
 "true" 
  
 AND 
  
 ( 
 cell_plan 
 [ 
 'data_plan_05gb' 
 ] 
  
 = 
  
 "false" 
  
 OR 
  
 cell_plan 
 [ 
 'data_plan_05gb' 
 ] 
  
 IS 
  
 NULL 
 ) 
  
 AND 
  
 ( 
 cell_plan 
 [ 
 'data_plan_10gb' 
 ] 
  
 = 
  
 "false" 
  
 OR 
  
 cell_plan 
 [ 
 'data_plan_10gb' 
 ] 
  
 IS 
  
 NULL 
 ) 
  
 THEN 
  
 'x-small' 
  
 WHEN 
  
 cell_plan 
 [ 
 'data_plan_01gb' 
 ] 
  
 = 
  
 "true" 
  
 AND 
  
 ( 
 cell_plan 
 [ 
 'data_plan_05gb' 
 ] 
  
 = 
  
 "true" 
 ) 
  
 AND 
  
 ( 
 cell_plan 
 [ 
 'data_plan_10gb' 
 ] 
  
 = 
  
 "false" 
  
 OR 
  
 cell_plan 
 [ 
 'data_plan_10gb' 
 ] 
  
 IS 
  
 NULL 
 ) 
  
 THEN 
  
 'small' 
  
 WHEN 
  
 cell_plan 
 [ 
 'data_plan_01gb' 
 ] 
  
 = 
  
 "true" 
  
 AND 
  
 ( 
 cell_plan 
 [ 
 'data_plan_05gb' 
 ] 
  
 = 
  
 "false" 
  
 OR 
  
 cell_plan 
 [ 
 'data_plan_05gb' 
 ] 
  
 IS 
  
 NULL 
 ) 
  
 AND 
  
 ( 
 cell_plan 
 [ 
 'data_plan_10gb' 
 ] 
  
 = 
  
 "true" 
 ) 
  
 THEN 
  
 'medium' 
  
 WHEN 
  
 ( 
 cell_plan 
 [ 
 'data_plan_01gb' 
 ] 
  
 = 
  
 "false" 
  
 OR 
  
 cell_plan 
 [ 
 'data_plan_01gb' 
 ] 
  
 IS 
  
 NULL 
 ) 
  
 AND 
  
 ( 
 cell_plan 
 [ 
 'data_plan_05gb' 
 ] 
  
 = 
  
 "true" 
 ) 
  
 AND 
  
 ( 
 cell_plan 
 [ 
 'data_plan_10gb' 
 ] 
  
 = 
  
 "true" 
 ) 
  
 THEN 
  
 'large' 
  
 WHEN 
  
 cell_plan 
 [ 
 'data_plan_01gb' 
 ] 
  
 = 
  
 "true" 
  
 AND 
  
 ( 
 cell_plan 
 [ 
 'data_plan_05gb' 
 ] 
  
 = 
  
 "true" 
 ) 
  
 AND 
  
 ( 
 cell_plan 
 [ 
 'data_plan_10gb' 
 ] 
  
 = 
  
 "true" 
 ) 
  
 THEN 
  
 'x-large' 
  
 END 
  
 as 
  
 total_plan_capacity 
 , 
  
 _key 
  
 as 
  
 original_key 
 , 
  
 stats_summary 
 FROM 
  
 test_table 
 WHERE 
  
 ARRAY_INCLUDES_ANY 
 ( 
 MAP_KEYS 
 ( 
 cell_plan 
 ), 
  
 [ 
 "data_plan_01gb" 
 , 
  
 "data_plan_05gb" 
 , 
 "data_plan_10gb" 
 ] 
 ) 
 ORDER 
  
 BY 
  
 total_plan_capacity 
 , 
  
 original_key

Create a time-series-based index

Bigtable stores each cell value and its associated timestamp. By using the WITH_HISTORY=>TRUE flag in the FROM clause, you can retrieve all versions of data and their timestamps. The UNPACK table function then expands the query results so that each timestamped value appears on its own row with a separate _timestamp column. This lets you create an asynchronous secondary index where the new row key is based on these timestamp values, enabling quick lookups that are based on time ranges.

  SELECT 
  
 _timestamp 
  
 as 
  
 stats_timestamp 
 , 
  
 _key 
  
 as 
  
 original_key 
 , 
  
 stats_summary 
 FROM 
  
 UNPACK 
 (( 
  
 select 
  
 _key 
 , 
  
 stats_summary 
  
 FROM 
  
 test_table 
  
 ( 
 WITH_HISTORY 
 = 
> TRUE 
 ) 
 )) 
 ORDER 
  
 BY 
  
 stats_timestamp 
 , 
  
 original_key

Create an index entry for items and values in a JSON cell

Bigtable is often used as a key-value store for large-scale workloads, including data in formats such as JSON. The following example shows how to create an asynchronous secondary index for fast lookups of data that JSON stores. The query uses JSON_EXTRACT to extract the store_id from a JSON string within the transaction_information column family in a table named sales . This index allows for quick retrieval of sales transactions by specific store IDs.

  SELECT 
 JSON_EXTRACT 
 ( 
 transaction_information 
 [ 
 "json" 
 ] 
 , 
  
 "$.store_id" 
 ) 
  
 as 
  
 store_id 
 , 
 _key 
  
 as 
  
 original_key 
 , 
 transaction_information 
 from 
  
 sales 
 ORDER 
  
 BY 
  
 store_id 
 , 
  
 original_key 
 ;