GoogleSQL for Bigtable query examples

The examples on this page demonstrate SQL query patterns for common and advanced Bigtable queries. You can run GoogleSQL queries in the Bigtable Studio query editor. You can also run queries using the Bigtable client library for Java .

Before you read this page, read the GoogleSQL for Bigtable overview .

The examples on this page use IDs and values similar to those in Data for examples .

Common Bigtable SQL query patterns

The following are examples of common queries for Bigtable data. To see examples of similar queries that call the Bigtable Data API, see Read examples and Use filters . For examples of queries on structured row keys, see Structured row key queries .

Retrieve the latest version of all columns for a given row key.

   
 SELECT 
  
 * 
  
 FROM 
  
 myTable 
  
 WHERE 
  
 _key 
  
 = 
  
 'r1'

Retrieve all versions of all columns for a given row key.

   
 SELECT 
  
 * 
  
 FROM 
  
 myTable 
 ( 
 with_history 
  
 = 
>  
 TRUE 
 ) 
  
 WHERE 
  
 _key 
  
 = 
  
 'r1'

Retrieve the latest version of a particular column from a particular column family for a given row key.

   
 SELECT 
  
 stats_summary 
 [ 
 'os_build' 
 ] 
  
 AS 
  
 os 
  
 FROM 
  
 analytics 
  
 WHERE 
  
 _key 
  
 = 
  
 'phone#4c410523#20190501'

Retrieve the row keys and the latest version of multiple columns for a given row key range.

   
 SELECT 
  
 _key 
 , 
  
 stats_summary 
 [ 
 'os_build' 
 ] 
  
 AS 
  
 os 
 , 
  
 stats_summary 
 [ 
 'user_agent' 
 ] 
  
 AS 
  
 agent 
  
 FROM 
  
 analytics 
  
 WHERE 
  
 _key 
  
> = 
  
 'phone#4c410523#20190501' 
  
 AND 
  
 _key 
 < 
 'phone#4c410523#201906201'

Retrieve all versions of all columns for multiple row key ranges, up to 10 rows.

   
 SELECT 
  
 * 
  
 FROM 
  
 analytics 
 ( 
 with_history 
  
 = 
>  
 TRUE 
 ) 
  
 WHERE 
  
 ( 
 _key 
  
> = 
  
 'phone#4c410523#20190501' 
  
 AND 
  
 _key 
 < 
 'phone#4c410523#201906201' 
 ) 
  
 OR 
  
 ( 
 _key 
  
> = 
  
 'phone#5c10102#20190501' 
  
 AND 
  
 _key 
 < 
 'phone#5c10102#20190601' 
 ) 
  
 LIMIT 
  
 10

Retrieve all versions of all columns for multiple row keys.

   
 SELECT 
  
 * 
  
 FROM 
  
 analytics 
 ( 
 with_history 
  
 = 
>  
 TRUE 
 ) 
  
 WHERE 
  
 _key 
  
 = 
  
 'phone#4c410523#20190501' 
  
 OR 
  
 _key 
  
 = 
  
 'phone#4c410523#20190502'

Retrieve all versions of all columns for multiple row keys using a different approach.

   
 SELECT 
  
 * 
  
 FROM 
  
 analytics 
 ( 
 with_history 
  
 = 
>  
 TRUE 
 ) 
  
 WHERE 
  
 _key 
  
 IS 
  
 IN 
  
 ( 
 'phone#4c410523#20190501' 
 , 
  
 'phone#4c410523#20190502' 
 )

Retrieve the latest version of all columns within a column family for a row key prefix.

   
 SELECT 
  
 stats_summary 
  
 FROM 
  
 analytics 
  
 WHERE 
  
 _key 
  
 LIKE 
  
 'phone#%'

Retrieve the row keys and three latest versions of all columns within a column family for all rows in the table. This query requires a full table scan, so it's not recommended for low-latency, high-throughput access patterns.

   
 SELECT 
  
 _key 
 , 
  
 cell_plan 
  
 FROM 
  
 analytics 
 ( 
 with_history 
  
 = 
>  
 TRUE 
 , 
  
 latest_n 
  
 = 
>  
 3 
 )

Retrieve the latest version of all columns with row keys matching a specified regular expression. This query requires a full table scan, so it's not recommended for low-latency, high-throughput access patterns, unless you also provide a row key prefix or row key range predicate in the WHERE clause.

   
 SELECT 
  
 * 
  
 FROM 
  
 myTable 
 ( 
 with_history 
  
 = 
>  
 TRUE 
 ) 
  
 WHERE 
  
 REGEXP_CONTAINS 
 ( 
 _key 
 , 
  
 '.*#20190501$' 
 )

Retrieve the latest version of all columns with the matching row key prefix and counter value more than 123 . You don't need to cast for this comparison, because Bigtable aggregates are numeric.

   
 SELECT 
  
 * 
  
 FROM 
  
 myTable 
  
 WHERE 
  
 _key 
  
 LIKE 
  
 'user12%' 
  
 AND 
  
 counterFamily 
 [ 
 'counter' 
 ] 
 > 
 123

Retrieve the latest version of all columns for a row key prefix if the referrer matches a specific value.

   
 SELECT 
  
 * 
  
 FROM 
  
 analytics 
  
 WHERE 
  
 _key 
  
 LIKE 
  
 'com.mysite%' 
  
 AND 
  
 session 
 [ 
 'referrer' 
 ] 
  
 = 
  
 './home'

Categorize a given row based on the value of a given column. This query is similar to using a composing conditional filter in the Bigtable Data API.

   
 SELECT 
  
 * 
 , 
  
 CASE 
  
 cell_plan 
 [ 
 'data_plan' 
 ] 
  
 WHEN 
  
 '10gb' 
  
 THEN 
  
 'passed-filter' 
  
 ELSE 
  
 'filtered-out' 
  
 END 
  
 AS 
  
 label 
  
 FROM 
  
 analytics

Retrieve the row key and the column qualifiers in a specific column family for a specified row key range. In SQL, column families are represented by the map data type, where each column qualifier and value is mapped as a key-value pair. This SQL query is similar to using a strip value filter in the Bigtable Data API.

   
 SELECT 
  
 _key 
 , 
  
 MAP_KEYS 
 ( 
 cell_plan 
 ) 
  
 AS 
  
 keys 
  
 FROM 
  
 analytics 
  
 WHERE 
  
 _key 
  
> = 
  
 'phone#4c410523#20190501' 
  
 AND 
  
 _key 
 < 
 'phone#4c410523#201906201'

The UNPACK function lets you transform Bigtable data into a tabular time series format, which is useful when performing time series analysis. Consider an example where you have a clicks column in an engagement column family. The following query uses UNPACK to see how certain campaigns are performing by aggregating the clicks over a minute from the last hour.

   
 SELECT 
  
 FORMAT_TIMESTAMP 
 ( 
 '%M' 
 , 
  
 _timestamp 
 ) 
  
 AS 
  
 minute 
 , 
  
 COUNT 
 ( 
 clicks 
 ) 
  
 AS 
  
 total_clicks 
  
 FROM 
  
 UNPACK 
 (( 
  
 SELECT 
  
 engagement 
 [ 
 'clicks' 
 ] 
  
 as 
  
 clicks 
  
 FROM 
  
 metrics 
 ( 
 with_history 
  
 = 
>  
 true 
 , 
  
 after 
  
 = 
>  
 TIMESTAMP_SUB 
 ( 
 CURRENT_TIMESTAMP 
 (), 
  
 INTERVAL 
  
 1 
  
 HOUR 
 )) 
  
 WHERE 
  
 _key 
  
 = 
  
 @campaign_id 
  
 )) 
  
 GROUP 
  
 BY 
  
 minute 
 ;

Advanced Bigtable SQL query patterns

The following samples demonstrate more advanced patterns.

With the following query, you can retrieve the row key and most recent value of the JSON attribute abc in the session column family. For more information, see JSON functions .

   
 SELECT 
  
 _key 
 , 
  
 JSON_VALUE 
 ( 
 session 
 [ 
 'payload' 
 ] 
 , 
  
 '$.abc' 
 ) 
  
 AS 
  
 abc 
  
 FROM 
  
 analytics

With the following query, you can retrieve the row key and calculate the average session length using the latest value of two Bigtable aggregate cells , which are numeric, for each row in the table.

   
 SELECT 
  
 _key 
  
 AS 
  
 userid 
 , 
  
 session 
 [ 
 'total_minutes' 
 ] 
  
 / 
  
 session 
 [ 
 'count' 
 ] 
  
 AS 
  
 avg_session_length 
  
 FROM 
  
 analytics

With the following query, you can retrieve the latest version of all columns for a given row key prefix if the session column family contains referrer , origin , or server as a column qualifier. Alternatively, this query can also be written as a series of individual comparisons such as session['referrer'] IS NOT NULL OR session['origin'] IS NOT NULL . However, for queries involving a large number of comparisons, the following approach is recommended.

   
 SELECT 
  
 * 
  
 FROM 
  
 analytics 
  
 WHERE 
  
 _key 
  
 LIKE 
  
 'com.abc%' 
  
 AND 
  
 ARRAY_INCLUDES_ANY 
 ( 
 MAP_KEYS 
 ( 
 session 
 ), 
  
 [ 
 'referrer' 
 , 
  
 'origin' 
 , 
  
 'server' 
 ] 
 )

With the following query, you can retrieve the latest version of all columns for a given row key prefix if the session column family contains referrer , origin , and server as column qualifiers. Alternatively, this query can be written as a series of individual comparisons such as session['referrer'] IS NOT NULL AND session ['origin'] IS NOT NULL .

   
 SELECT 
  
 * 
  
 FROM 
  
 analytics 
  
 WHERE 
  
 _key 
  
 LIKE 
  
 'com.abc%' 
  
 AND 
  
 ARRAY_INCLUDES_ALL 
 ( 
 MAP_KEYS 
 ( 
 session 
 ), 
  
 [ 
 'referrer' 
 , 
  
 'origin' 
 , 
  
 'server' 
 ] 
 )

With the following query, you can retrieve the latest version of all columns for a given row key prefix if the session column family contains com.google.search , com.google.maps , or com.google.shopping as values.

   
 SELECT 
  
 * 
  
 FROM 
  
 analytics 
  
 WHERE 
  
 _key 
  
 LIKE 
  
 'com.abc%' 
  
 AND 
  
 ARRAY_INCLUDES_ANY 
 ( 
  
 MAP_VALUES 
 ( 
 session 
 ), 
  
 [ 
 'com.google.search' 
 , 
  
 'com.google.maps' 
 , 
  
 'com.google.shopping' 
 ] 
 )

With the following query, you can retrieve the latest version of all columns if key-value pairs in the cell_plan column family include both data_plan:unlimited and roaming:North America .

   
 SELECT 
  
 * 
  
 FROM 
  
 analytics 
  
 WHERE 
  
 ARRAY_INCLUDES_ALL 
 ( 
  
 CAST 
 ( 
  
 MAP_ENTRIES 
 ( 
 cell_plan 
 ) 
  
 AS 
  
 ARRAY<STRUCT<key 
  
 STRING 
 , 
  
 value 
  
 STRING 
>> ), 
  
 [ 
 ( 
 'data_plan' 
 , 
  
 'unlimited' 
 ), 
  
 ( 
 'roaming' 
 , 
  
 'North America' 
 ) 
 ] 
 )

With the following query, you can retrieve the row key and temperature readings for weather sensors for cases where the temperature exceeded 70 degrees during the last seven measurements.

   
 SELECT 
  
 _key 
  
 AS 
  
 sensorid 
 , 
  
 ARRAY_FILTER 
 ( 
  
 CAST 
 ( 
  
 sensor 
 [ 
 'temperature' 
 ] 
  
 AS 
  
 ARRAY<STRUCT<timestamp 
  
 TIMESTAMP 
 , 
  
 value 
  
 STRING 
>> ), 
  
 e 
  
 - 
>  
 CAST 
 ( 
 e 
 . 
 value 
  
 AS 
  
 FLOAT32 
 ) 
 > 
 70 
 ) 
  
 AS 
  
 high_temperature 
  
 FROM 
  
 weather 
 ( 
 with_history 
  
 = 
>  
 TRUE 
 , 
  
 latest_n 
  
 = 
>  
 7 
 )

In temporal filtering order, latest_n comes last, so a query like after => X, before => y, latest_n => 3 returns the latest three values that satisfy the after and before conditions. If your use case requires latest_n to take precedence, you can provide latest_n as the only temporal filter, and then apply the rest of the temporal filters using query operators in your SELECT statement, as shown in the example. For more information, see Temporal filters .

   
 SELECT 
  
 ARRAY_FILTER 
 ( 
  
 CAST 
 ( 
  
 address 
 [ 
 'street' 
 ] 
  
 AS 
  
 ARRAY<STRUCT<timestamp 
  
 TIMESTAMP 
 , 
  
 value 
  
 STRING 
>> ), 
  
 e 
  
 - 
>  
 e 
 . 
 timestamp 
 > 
 TIMESTAMP 
 ( 
 '2021-01-04T23:51:00.000Z' 
 )) 
  
 AS 
  
 street_address 
  
 FROM 
  
 locations 
 ( 
 with_history 
  
 = 
>  
 TRUE 
 , 
  
 latest_n 
  
 = 
>  
 3 
 )

Similar to the previous example, you can apply a different temporal filter to each column family in your query. For example, the following query returns the three most recent versions of the street column and the two least recent versions of the state column.

   
 SELECT 
  
 ARRAY_FILTER 
 ( 
  
 CAST 
 ( 
  
 address 
 [ 
 'street' 
 ] 
  
 AS 
  
 ARRAY<STRUCT<timestamp 
  
 TIMESTAMP 
 , 
  
 value 
  
 STRING 
>> ), 
  
 ( 
 e 
 , 
  
 i 
 ) 
  
 - 
>  
 i 
  
< = 
  
 2 
 ) 
  
 AS 
  
 street_address 
 , 
  
 ARRAY_FILTER 
 ( 
  
 ARRAY_REVERSE 
 ( 
  
 CAST 
 ( 
  
 address 
 [ 
 'state' 
 ] 
  
 AS 
  
 ARRAY<STRUCT<timestamp 
  
 TIMESTAMP 
 , 
  
 value 
  
 STRING 
>> )), 
  
 ( 
 e 
 , 
  
 i 
 ) 
  
 - 
>  
 i 
  
< = 
  
 1 
 ) 
  
 AS 
  
 state 
  
 FROM 
  
 locations 
 ( 
 with_history 
  
 = 
>  
 TRUE 
 )

With the following query, you can retrieve all versions of all columns if key-value pairs in the address column family include both city:Savannah or city:Nashville at any point in time.

   
 SELECT 
  
 * 
  
 FROM 
  
 locations 
 ( 
 with_history 
  
 = 
>  
 TRUE 
 ) 
  
 WHERE 
  
 ARRAY_LENGTH 
 ( 
  
 ARRAY_FILTER 
 ( 
  
 CAST 
 ( 
  
 MAP_ENTRIES 
 ( 
 address 
 ) 
  
 AS 
  
 ARRAY 
<  
 STRUCT 
<  
 key 
  
 STRING 
 , 
  
 value 
  
 ARRAY<STRUCT<timestamp 
  
 TIMESTAMP 
 , 
  
 value 
  
 STRING 
>>>> ), 
  
 e 
  
 - 
>  
 e 
 . 
 key 
  
 = 
  
 'city' 
  
 AND 
  
 ARRAY_INCLUDES_ANY 
 ( 
  
 ARRAY_TRANSFORM 
 ( 
 e 
 . 
 value 
 , 
  
 k 
  
 - 
>  
 k 
 . 
 value 
 ), 
  
 [ 
 'Savannah' 
 , 
  
 'Nashville' 
 ] 
 ))) 
 > 
 0

In this particular example, casting is not required, so this can also be written in the following shorter form.

   
 SELECT 
  
 * 
  
 FROM 
  
 locations 
 ( 
 with_history 
  
 = 
>  
 TRUE 
 ) 
  
 WHERE 
  
 ARRAY_LENGTH 
 ( 
  
 ARRAY_FILTER 
 ( 
  
 MAP_ENTRIES 
 ( 
 address 
 ), 
  
 e 
  
 - 
>  
 e 
 . 
 key 
  
 = 
  
 'city' 
  
 AND 
  
 ARRAY_INCLUDES_ANY 
 ( 
  
 ARRAY_TRANSFORM 
 ( 
 e 
 . 
 value 
 , 
  
 k 
  
 - 
>  
 k 
 . 
 value 
 ), 
  
 [ 
 'Savannah' 
 , 
  
 'Nashville' 
 ] 
 ))) 
 > 
 0

GoogleSQL for Bigtable query examples

Common Bigtable SQL query patterns

Advanced Bigtable SQL query patterns

What's next