Query a Bigtable change log in BigQuery
This page provides guidance and example queries to help you process a Bigtable change log in BigQuery.
This page is intended for users who have completed the following:
- Set up a Bigtable change stream-enabled table .
- Run the Dataflow template that writes a change log to BigQuery. You can follow the quickstart to learn how to set this up.
This guide assumes some knowledge of BigQuery. To learn more, you can follow the quickstart that shows how to load and query data .
Open the change log table
-
In the Google Cloud console, go to the BigQuerypage.
-
In the Explorerpane, expand your project.
-
Expand your dataset.
-
Click the table with the suffix:
_changelog
.
Table format
The entire output schema contains several columns. This guide focuses on connecting the rows to their columns and values, and parsing values into analyzable formats.
Basic queries
The examples in this section use a Bigtable table for tracking
credit card sales. The table has one column family ( cf
) and the following
columns:
- Row key with the format
credit card number
#transaction timestamp
- Merchant
- Amount
- Category
- Transaction date
Query one column
Filter the results to just one column family and one column using a WHERE
clause.
SELECT
row_key
,
column_family
,
column
,
value
,
timestamp
,
FROM
your_dataset
.
your_table
WHERE
mod_type
=
"SET_CELL"
AND
column_family
=
"cf"
AND
column
=
"merchant"
LIMIT
1000
Parse values
All values are stored as strings or byte strings. You can cast a value to its intended type with conversion functions .
SELECT
row_key
,
column_family
,
column
,
value
,
CAST
(
value
AS
NUMERIC
)
AS
amount
FROM
your_dataset
.
your_table
WHERE
mod_type
=
"SET_CELL"
AND
column_family
=
"cf"
AND
column
=
"amount"
LIMIT
1000
Perform aggregations
You can perform more operations, such as aggregations on numeric values.
SELECT
SUM
(
CAST
(
value
AS
NUMERIC
))
as
total_amount
FROM
your_dataset
.
your_table
WHERE
mod_type
=
"SET_CELL"
AND
column_family
=
"cf"
AND
column
=
"amount"
Pivot the data
To perform queries that involve multiple Bigtable columns, you need to pivot the table. Each new BigQuery row includes one data change record returned by the change stream from its corresponding row in your Bigtable table. Depending on your schema, you can use a combination of the row key and timestamp to group the data.
SELECT
*
FROM
(
SELECT
row_key
,
timestamp
,
column
,
value
FROM
your_dataset
.
your_table
)
PIVOT
(
MAX
(
value
)
FOR
column
in
(
"merchant"
,
"amount"
,
"category"
,
"transaction_date"
)
)
Pivoting with a dynamic columnset
If you have a dynamic set of columns, you can do some additional processing to get all the columns and put them into the query programmatically.
DECLARE
cols
STRING
;
SET
cols
=
(
SELECT
CONCAT
(
'("'
,
STRING_AGG
(
DISTINCT
column
,
'", "'
),
'")'
),
FROM
your_dataset
.
your_table
);
EXECUTE
IMMEDIATE
format
(
"
""
SELECT * FROM (
SELECT row_key, timestamp, column, value
FROM your_dataset.your_table
)
PIVOT (
MAX(value)
FOR column in %s
)
""
"
,
cols
);
JSON Data
If you are setting all values with JSON, you need to parse them and extract the values based on the keys. You can use parsing functions once you have derived the value from the JSON object. These examples use the credit card sales data introduced earlier , but instead of writing data to multiple columns, the data is written as a single column as a JSON object.
SELECT
row_key
,
JSON_VALUE
(
value
,
"$.category"
)
as
category
,
CAST
(
JSON_VALUE
(
value
,
"$.amount"
)
AS
NUMERIC
)
as
amount
FROM
your_dataset
.
your_table
LIMIT
1000
Aggregation queries with JSON
You can perform aggregation queries with JSON values.
SELECT
JSON_VALUE
(
value
,
"$.category"
)
as
category
,
SUM
(
CAST
(
JSON_VALUE
(
value
,
"$.amount"
)
AS
NUMERIC
))
as
total_amount
FROM
your_dataset
.
your_table
GROUP
BY
category
What's next
- Learn more about Bigtable change streams .
- Refer to the Bigtable change stream to BigQuery template reference page .
- Learn how to set up the Bigtable change stream to BigQuery template .