GoogleSQL for Bigtable query examples
The examples on this page demonstrate SQL query patterns for common and advanced Bigtable queries. You can run GoogleSQL queries in the Bigtable Studio query editor. You can also run queries using the Bigtable client library for Java .
Before you read this page, read the GoogleSQL for Bigtable overview .
The examples on this page use IDs and values similar to those in Data for examples .
Common Bigtable SQL query patterns
The following are examples of common queries for Bigtable data. To see examples of similar queries that call the Bigtable Data API, see Read examples and Use filters . For examples of queries on structured row keys, see Structured row key queries .
Retrieve the latest version of all columns for a given row key.
SELECT
*
FROM
myTable
WHERE
_key
=
'r1'
Retrieve all versions of all columns for a given row key.
SELECT
*
FROM
myTable
(
with_history
=
>
TRUE
)
WHERE
_key
=
'r1'
Retrieve the latest version of a particular column from a particular column family for a given row key.
SELECT
stats_summary
[
'os_build'
]
AS
os
FROM
analytics
WHERE
_key
=
'phone#4c410523#20190501'
Retrieve the row keys and the latest version of multiple columns for a given row key range.
SELECT
_key
,
stats_summary
[
'os_build'
]
AS
os
,
stats_summary
[
'user_agent'
]
AS
agent
FROM
analytics
WHERE
_key
> =
'phone#4c410523#20190501'
AND
_key
<
'phone#4c410523#201906201'
Retrieve all versions of all columns for multiple row key ranges, up to 10 rows.
SELECT
*
FROM
analytics
(
with_history
=
>
TRUE
)
WHERE
(
_key
> =
'phone#4c410523#20190501'
AND
_key
<
'phone#4c410523#201906201'
)
OR
(
_key
> =
'phone#5c10102#20190501'
AND
_key
<
'phone#5c10102#20190601'
)
LIMIT
10
Retrieve all versions of all columns for multiple row keys.
SELECT
*
FROM
analytics
(
with_history
=
>
TRUE
)
WHERE
_key
=
'phone#4c410523#20190501'
OR
_key
=
'phone#4c410523#20190502'
Retrieve all versions of all columns for multiple row keys using a different approach.
SELECT
*
FROM
analytics
(
with_history
=
>
TRUE
)
WHERE
_key
IS
IN
(
'phone#4c410523#20190501'
,
'phone#4c410523#20190502'
)
Retrieve the latest version of all columns within a column family for a row key prefix.
SELECT
stats_summary
FROM
analytics
WHERE
_key
LIKE
'phone#%'
Retrieve the row keys and three latest versions of all columns within a column family for all rows in the table. This query requires a full table scan, so it's not recommended for low-latency, high-throughput access patterns.
SELECT
_key
,
cell_plan
FROM
analytics
(
with_history
=
>
TRUE
,
latest_n
=
>
3
)
Retrieve the latest version of all columns with row keys matching a specified
regular expression. This query requires a full table scan, so it's not
recommended for low-latency, high-throughput access patterns, unless you also
provide a row key prefix or row key range predicate in the WHERE
clause.
SELECT
*
FROM
myTable
(
with_history
=
>
TRUE
)
WHERE
REGEXP_CONTAINS
(
_key
,
'.*#20190501$'
)
Retrieve the latest version of all columns with the matching row key prefix and
counter value more than 123
. You don't need to cast for this comparison,
because Bigtable
aggregates
are numeric.
SELECT
*
FROM
myTable
WHERE
_key
LIKE
'user12%'
AND
counterFamily
[
'counter'
]
>
123
Retrieve the latest version of all columns for a row key prefix if the referrer matches a specific value.
SELECT
*
FROM
analytics
WHERE
_key
LIKE
'com.mysite%'
AND
session
[
'referrer'
]
=
'./home'
Categorize a given row based on the value of a given column. This query is similar to using a composing conditional filter in the Bigtable Data API.
SELECT
*
,
CASE
cell_plan
[
'data_plan'
]
WHEN
'10gb'
THEN
'passed-filter'
ELSE
'filtered-out'
END
AS
label
FROM
analytics
Retrieve the row key and the column qualifiers in a specific column family for a specified row key range. In SQL, column families are represented by the map data type, where each column qualifier and value is mapped as a key-value pair. This SQL query is similar to using a strip value filter in the Bigtable Data API.
SELECT
_key
,
MAP_KEYS
(
cell_plan
)
AS
keys
FROM
analytics
WHERE
_key
> =
'phone#4c410523#20190501'
AND
_key
<
'phone#4c410523#201906201'
The UNPACK
function lets you transform Bigtable data into a
tabular time series format, which is useful when performing time series
analysis. Consider an example where you have a clicks
column in an engagement
column family. The following query uses UNPACK
to see how certain
campaigns are performing by aggregating the clicks over a minute from the last
hour.
SELECT
FORMAT_TIMESTAMP
(
'%M'
,
_timestamp
)
AS
minute
,
COUNT
(
clicks
)
AS
total_clicks
FROM
UNPACK
((
SELECT
engagement
[
'clicks'
]
as
clicks
FROM
metrics
(
with_history
=
>
true
,
after
=
>
TIMESTAMP_SUB
(
CURRENT_TIMESTAMP
(),
INTERVAL
1
HOUR
))
WHERE
_key
=
@campaign_id
))
GROUP
BY
minute
;
Advanced Bigtable SQL query patterns
The following samples demonstrate more advanced patterns.
With the following query, you can retrieve the row key and most recent value of
the JSON attribute abc
in the session
column family. For more information,
see JSON
functions
.
SELECT
_key
,
JSON_VALUE
(
session
[
'payload'
]
,
'$.abc'
)
AS
abc
FROM
analytics
With the following query, you can retrieve the row key and calculate the average session length using the latest value of two Bigtable aggregate cells , which are numeric, for each row in the table.
SELECT
_key
AS
userid
,
session
[
'total_minutes'
]
/
session
[
'count'
]
AS
avg_session_length
FROM
analytics
With the following query, you can retrieve the latest version of all columns for
a given row key prefix if the session
column family contains referrer
, origin
, or server
as a column qualifier. Alternatively, this query can also
be written as a series of individual comparisons such as session['referrer']
IS NOT NULL OR session['origin'] IS NOT NULL
. However, for queries involving
a large number of comparisons, the following approach is recommended.
SELECT
*
FROM
analytics
WHERE
_key
LIKE
'com.abc%'
AND
ARRAY_INCLUDES_ANY
(
MAP_KEYS
(
session
),
[
'referrer'
,
'origin'
,
'server'
]
)
With the following query, you can retrieve the latest version of all columns for
a given row key prefix if the session
column family contains referrer
, origin
, and server
as column qualifiers. Alternatively, this query can be
written as a series of individual comparisons such as session['referrer'] IS
NOT NULL AND session ['origin'] IS NOT NULL
.
SELECT
*
FROM
analytics
WHERE
_key
LIKE
'com.abc%'
AND
ARRAY_INCLUDES_ALL
(
MAP_KEYS
(
session
),
[
'referrer'
,
'origin'
,
'server'
]
)
With the following query, you can retrieve the latest version of all columns for
a given row key prefix if the session
column family contains com.google.search
, com.google.maps
, or com.google.shopping
as values.
SELECT
*
FROM
analytics
WHERE
_key
LIKE
'com.abc%'
AND
ARRAY_INCLUDES_ANY
(
MAP_VALUES
(
session
),
[
'com.google.search'
,
'com.google.maps'
,
'com.google.shopping'
]
)
With the following query, you can retrieve the latest version of all columns if
key-value pairs in the cell_plan
column family include both data_plan:unlimited
and roaming:North America
.
SELECT
*
FROM
analytics
WHERE
ARRAY_INCLUDES_ALL
(
CAST
(
MAP_ENTRIES
(
cell_plan
)
AS
ARRAY<STRUCT<key
STRING
,
value
STRING
>> ),
[
(
'data_plan'
,
'unlimited'
),
(
'roaming'
,
'North America'
)
]
)
With the following query, you can retrieve the row key
and temperature
readings for weather sensors for cases where the temperature exceeded 70 degrees
during the last seven measurements.
SELECT
_key
AS
sensorid
,
ARRAY_FILTER
(
CAST
(
sensor
[
'temperature'
]
AS
ARRAY<STRUCT<timestamp
TIMESTAMP
,
value
STRING
>> ),
e
-
>
CAST
(
e
.
value
AS
FLOAT32
)
>
70
)
AS
high_temperature
FROM
weather
(
with_history
=
>
TRUE
,
latest_n
=
>
7
)
In temporal filtering order, latest_n
comes last, so a query like after => X,
before => y, latest_n => 3
returns the latest three values that satisfy the
after and before conditions. If your use case requires latest_n
to take
precedence, you can provide latest_n
as the only temporal filter,
and then apply the rest of the temporal filters using query operators in your SELECT
statement, as shown in the example. For more information, see Temporal
filters
.
SELECT
ARRAY_FILTER
(
CAST
(
address
[
'street'
]
AS
ARRAY<STRUCT<timestamp
TIMESTAMP
,
value
STRING
>> ),
e
-
>
e
.
timestamp
>
TIMESTAMP
(
'2021-01-04T23:51:00.000Z'
))
AS
street_address
FROM
locations
(
with_history
=
>
TRUE
,
latest_n
=
>
3
)
Similar to the previous example, you can apply a different temporal filter to
each column family in your query. For example, the following query returns the
three most recent versions of the street
column
and the two least recent versions of the state
column.
SELECT
ARRAY_FILTER
(
CAST
(
address
[
'street'
]
AS
ARRAY<STRUCT<timestamp
TIMESTAMP
,
value
STRING
>> ),
(
e
,
i
)
-
>
i
< =
2
)
AS
street_address
,
ARRAY_FILTER
(
ARRAY_REVERSE
(
CAST
(
address
[
'state'
]
AS
ARRAY<STRUCT<timestamp
TIMESTAMP
,
value
STRING
>> )),
(
e
,
i
)
-
>
i
< =
1
)
AS
state
FROM
locations
(
with_history
=
>
TRUE
)
With the following query, you can retrieve all versions of all columns if
key-value pairs in the address column family include both city:Savannah
or city:Nashville
at any point in time.
SELECT
*
FROM
locations
(
with_history
=
>
TRUE
)
WHERE
ARRAY_LENGTH
(
ARRAY_FILTER
(
CAST
(
MAP_ENTRIES
(
address
)
AS
ARRAY
<
STRUCT
<
key
STRING
,
value
ARRAY<STRUCT<timestamp
TIMESTAMP
,
value
STRING
>>>> ),
e
-
>
e
.
key
=
'city'
AND
ARRAY_INCLUDES_ANY
(
ARRAY_TRANSFORM
(
e
.
value
,
k
-
>
k
.
value
),
[
'Savannah'
,
'Nashville'
]
)))
>
0
In this particular example, casting is not required, so this can also be written in the following shorter form.
SELECT
*
FROM
locations
(
with_history
=
>
TRUE
)
WHERE
ARRAY_LENGTH
(
ARRAY_FILTER
(
MAP_ENTRIES
(
address
),
e
-
>
e
.
key
=
'city'
AND
ARRAY_INCLUDES_ANY
(
ARRAY_TRANSFORM
(
e
.
value
,
k
-
>
k
.
value
),
[
'Savannah'
,
'Nashville'
]
)))
>
0