Query protobuf data

This document provides examples of common query patterns for reading and querying protocol buffer (protobuf) data stored in Bigtable.

Before you read this page, familiarize yourself with the following:

Example data

The following examples use a Music table that stores information about albums and artists. The data is stored in a column family named album_details , with a column qualifier album . The column qualifier contains protobuf messages.

The protobuf schema is defined in the following proto files:

  • artist.proto :

      syntax 
      
     = 
      
     "proto3" 
     ; 
     package 
      
     package_name 
     ; 
     message 
      
     Artist 
      
     { 
      
     string 
      
     name 
      
     = 
      
     1 
     ; 
     } 
     
    
  • album.proto :

      syntax 
      
     = 
      
     "proto3" 
     ; 
     package 
      
     package_name 
     ; 
     import 
      
     "artist.proto" 
     ; 
     message 
      
     Album 
      
     { 
      
     string 
      
     title 
      
     = 
      
     1 
     ; 
      
     Artist 
      
     artist 
      
     = 
      
     2 
     ; 
      
     int32 
      
     release_year 
      
     = 
      
     3 
     ; 
     } 
     
    

As a result, Bigtable creates a schema bundle for this table that contains the following descriptor set for these protobuf definitions:

  file 
  
 { 
  
 name 
 : 
  
 "artist.proto" 
  
 package 
 : 
  
 "package_name" 
  
 message_type 
  
 { 
  
 name 
 : 
  
 "Artist" 
  
 field 
  
 { 
  
 name 
 : 
  
 "name" 
  
 number 
 : 
  
 1 
  
 label 
 : 
  
 LABEL_OPTIONAL 
  
 type 
 : 
  
 TYPE_STRING 
  
 json_name 
 : 
  
 "name" 
  
 } 
  
 } 
  
 syntax 
 : 
  
 "proto3" 
 } 
 file 
  
 { 
  
 name 
 : 
  
 "album.proto" 
  
 package 
 : 
  
 "package_name" 
  
 dependency 
 : 
  
 "artist.proto" 
  
 message_type 
  
 { 
  
 name 
 : 
  
 "Album" 
  
 field 
  
 { 
  
 name 
 : 
  
 "title" 
  
 number 
 : 
  
 1 
  
 label 
 : 
  
 LABEL_OPTIONAL 
  
 type 
 : 
  
 TYPE_STRING 
  
 json_name 
 : 
  
 "title" 
  
 } 
  
 field 
  
 { 
  
 name 
 : 
  
 "artist" 
  
 number 
 : 
  
 2 
  
 label 
 : 
  
 LABEL_OPTIONAL 
  
 type 
 : 
  
 TYPE_MESSAGE 
  
 type_name 
 : 
  
 ".package_name.Artist" 
  
 json_name 
 : 
  
 "artist" 
  
 } 
  
 field 
  
 { 
  
 name 
 : 
  
 "release_year" 
  
 number 
 : 
  
 3 
  
 label 
 : 
  
 LABEL_OPTIONAL 
  
 type 
 : 
  
 TYPE_INT32 
  
 json_name 
 : 
  
 "releaseYear" 
  
 } 
  
 } 
  
 syntax 
 : 
  
 "proto3" 
 } 
 

Example queries

The following examples show how to query protobuf data using GoogleSQL for Bigtable and BigQuery external tables.

Cast a column to a protobuf message

You can use the CAST operator to interpret a BYTES value as a protobuf message. To do this, you must provide the full name of the protobuf message in the following format: SCHEMA_BUNDLE_ID . FULLY_QUALIFIED_MESSAGE_NAME .

Replace the following:

  • SCHEMA_BUNDLE_ID : The unique ID you assigned to your schema bundle when you created it.
  • FULLY_QUALIFIED_MESSAGE_NAME : The full name of the message, which must include the package name defined in your proto file–for example, package_name.message_name .

The following sample query casts the album column to the Album protobuf message. This message is defined in the package_name package and is part of a schema bundle named bundle_name :

  SELECT 
  
 CAST 
 ( 
 album_details 
 [ 
 'album' 
 ] 
  
 AS 
  
 bundle_name 
 . 
 package_name 
 . 
 Album 
 ). 
 title 
 FROM 
  
 Music 
 ; 
 

Access nested fields

You can access nested fields within a protobuf message using dot notation.

The following query retrieves the name of the artist from the nested Artist message within the Album message:

  SELECT 
  
 CAST 
 ( 
 album_details 
 [ 
 'album' 
 ] 
  
 AS 
  
 bundle_name 
 . 
 package_name 
 . 
 Album 
 ). 
 artist 
 . 
 name 
 FROM 
  
 Music 
 ; 
 

Filter based on protobuf fields

You can use the WHERE clause to filter rows based on the values of fields within a protobuf message.

The following query selects all albums by the artist Dana A.:

  SELECT 
  
 * 
 FROM 
  
 Music 
 WHERE 
  
 CAST 
 ( 
 album_details 
 [ 
 'album' 
 ] 
  
 AS 
  
 bundle_name 
 . 
 package_name 
 . 
 Album 
 ). 
 artist 
 . 
 name 
  
 = 
  
 'Dana A.' 
 ; 
 

Aggregate protobuf fields

You can use aggregate functions like SUM , AVG , MIN , MAX , and COUNT on numeric fields within your protobuf messages.

The following query calculates the average release year of all albums in the table:

  SELECT 
  
 AVG 
 ( 
 CAST 
 ( 
 album_details 
 [ 
 'album' 
 ] 
  
 AS 
  
 bundle_name 
 . 
 package_name 
 . 
 Album 
 ). 
 release_year 
 ) 
 FROM 
  
 Music 
 ; 
 

Order by protobuf fields

You can use the ORDER BY clause to sort the result set based on a field in your protobuf message.

The following query retrieves all albums and sorts them by their release year in descending order:

  SELECT 
  
 * 
 FROM 
  
 Music 
 ORDER 
  
 BY 
  
 CAST 
 ( 
 album_details 
 [ 
 'album' 
 ] 
  
 AS 
  
 bundle_name 
 . 
 package_name 
 . 
 Album 
 ). 
 release_year 
  
 DESC 
 ; 
 

Use with BigQuery external tables

You can query protobuf data stored in Bigtable from BigQuery by creating an external table. When creating the external table, you specify the column type as JSON , its encoding as PROTO_BINARY , and you associate it with a schema bundle. This process converts the protobuf message bytes into the equivalent JSON data, allowing you to query its fields directly.

The following is an example of a table definition file for creating a BigQuery external table over the Music table:

  { 
  
 "sourceFormat" 
 : 
  
 "BIGTABLE" 
 , 
  
 "sourceUris" 
 : 
  
 [ 
  
 "https://googleapis.com/bigtable/projects/PROJECT_ID/instances/INSTANCE_ID/tables/Music" 
  
 ], 
  
 "bigtableOptions" 
 : 
  
 { 
  
 "columnFamilies" 
  
 : 
  
 [ 
  
 { 
  
 "familyId" 
 : 
  
 "album_details" 
 , 
  
 "columns" 
 : 
  
 [ 
  
 { 
  
 "qualifierString" 
 : 
  
 "album" 
 , 
  
 "type" 
 : 
  
 "JSON" 
 , 
  
 "encoding" 
 : 
  
 "PROTO_BINARY" 
 , 
  
 "protoConfig" 
 : 
  
 { 
  
 "schemaBundleId" 
 : 
  
 "bundle_name" 
 , 
  
 "protoMessageName" 
 : 
  
 "package_name.Album" 
  
 } 
  
 } 
  
 ] 
  
 } 
  
 ] 
  
 } 
 } 
 

Once the external table is created, you can query the protobuf data as a JSON column in BigQuery.

The following query retrieves the title of all albums released in 2022 from the BigQuery external table:

  SELECT 
  
 JSON_VALUE 
 ( 
 value 
 , 
  
 '$.title' 
 ) 
  
 AS 
  
 title 
 FROM 
 ` 
 PROJECT_ID 
 . 
 DATASET 
 . 
 TABLE_NAME 
 ` 
  
 AS 
  
 t 
 , 
 UNNEST 
 ( 
 t 
 . 
 album_details 
 . 
 album 
 . 
 cell 
 ) 
 WHERE 
  
 INT64 
 ( 
 JSON_EXTRACT 
 ( 
 value 
 , 
  
 '$.releaseYear' 
 )) 
  
 = 
  
 2022 
 ; 
 

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: