Protocol buffers are a flexible mechanism for serializing structured
data. They are small in size and efficient to send over RPCs.
Protocol buffers are represented in Spanner using the PROTO
data type. A column can contain PROTO
values the same way it can
contain INT32
or STRING
values.
A protocol buffer contains zero or more fields inside it. Each field inside a
protocol buffer has its own type. All SQL data types except STRUCT
can be
contained inside a PROTO
. Repeated fields in a protocol buffer are represented
as ARRAY
s. To learn more about the PROTO
data type, see Protocol buffer
type
. For related
functions, see Protocol buffer functions
.
To query protocol buffers, you need to understand how they are represented, what features they support, and what data they can contain. To learn more about protocol buffers, see the Protocol Buffers Developer Guide .
Construct a protocol buffer
You can construct a protocol buffer with the NEW
operator or
the SELECT AS typename
statement. To learn more about
constructing protocol buffers, see Protocol buffer
type
.
Cast to or from a protocol buffer
You can use the CAST AS PROTO
function to cast PROTO
to or
from BYTES
, STRING
, or PROTO
.
SELECT
CAST
(
'first_name: "Alana", last_name: "Yah", customer_no: 1234'
AS
example
.
CustomerInfo
);
Casting to or from BYTES
produces or parses proto2 wire format bytes
.
If there is a failure during the serialization or deserialization process,
Spanner throws an error. This can happen, for example, if no value is
specified for a required field.
Casting to or from STRING
produces or parses the proto2 text format
.
When casting from STRING
, unknown field names aren't parseable. This means you
need to be cautious, because round-tripping from PROTO
to STRING
back to PROTO
might result in unexpected results.
STRING
literals used where a PROTO
value is expected will be implicitly cast
to PROTO
. If the literal value can't be parsed using the expected PROTO
type, an error is raised. To return NULL
instead of raising an error, use SAFE_CAST
.
Create a protocol buffer
To create a protocol buffer, import the protocol messages from a protocol buffer
with the CREATE PROTO BUNDLE
statement.
Only the protocol message type names in a CREATE PROTO BUNDLE
statement are
stored in your database schema. For example:
First, create a protocol buffer bundle with the examples.shipping.Order
message:
syntax
=
"proto2"
;
package
examples
.
shipping
;
message
Order
{
optional
string
order_number
=
1
;
optional
int64
date
=
2
;
message
Address
{
optional
string
street
=
1
;
optional
string
city
=
2
;
optional
string
state
=
3
;
optional
string
country
=
4
;
}
optional
Address
shipping_address
=
3
;
message
Item
{
optional
string
product_name
=
1
;
optional
int32
quantity
=
2
;
}
repeated
Item
line_item
=
4
;
}
message
OrderHistory
{
optional
string
order_number
=
1
;
optional
int64
date
=
2
;
}
Next, compile the protocol buffer. Use the protoc
compiler
to generate the proto descriptor file along with the corresponding
language-specific classes from your .proto
file. To generate the order_descriptors.pb
proto descriptor:
protoc
--
include_imports
--
descriptor_set_out
=
order_descriptors
.
pb
order_protos
.
proto
Use the CREATE PROTO BUNDLE
DDL statement to load types available from proto
files into your database schema.
For example:
CREATE
PROTO
BUNDLE
(
examples
.
shipping
.
Order
,
examples
.
shipping
.
Order
.
Address
,
examples
.
shipping
.
Order
.
Item
)
Then, upload the proto descriptor using the gcloud spanner databases ddl update
command:
gcloud
spanner
databases
ddl
update
DATABASE_NAME
--
instance
=
INSTANCE_NAME
--
ddl
=
'CREATE PROTO BUNDLE (`examples.shipping.Order`, `examples.shipping.Order.Address`, `examples.shipping.Order.Item`);'
--
proto
-
descriptors
-
file
=
order_descriptors
.
pb
If you include nested message types within a protocol message in your DML
statement, the nested message types must be included in the CREATE PROTO
BUNDLE
DDL statement.
Finally, create an Orders
table with the Order
message:
CREATE
TABLE
Orders
(
Id
INT64
,
OrderInfo
`examples.shipping.Order`
,
)
PRIMARY
KEY
(
Id
);
Add a message to a protocol buffer
You can add messages to your protocol buffer with the ALTER PROTO BUNDLE
statement. For example:
ALTER
PROTO
BUNDLE
INSERT
(
examples
.
shipping
.
OrderHistory
)
Update the proto bundle with the added message using the gcloud spanner databases ddl update
command:
gcloud
spanner
databases
ddl
update
DATABASE_NAME
--
instance
=
INSTANCE_NAME
--
ddl
=
'ALTER PROTO BUNDLE INSERT (`examples.shipping.OrderHistory`);'
--
proto
-
descriptors
-
file
=
order_descriptors
.
pb
Use a protocol buffer message in a column definition
You can include a protocol buffer message when creating new tables by declaring a column's type as a protocol message. For example:
CREATE
TABLE
OrderHistory
(
Id
INT64
,
OrderHistoryInfo
examples
.
shipping
.
OrderHistory
,
)
PRIMARY
KEY
(
Id
);
Update a message for a protocol buffer
You can update and replace existing proto messages with the ALTER PROTO BUNDLE
statement. For example:
ALTER
PROTO
BUNDLE
UPDATE
(
examples
.
shipping
.
Order
,
)
Update the proto bundle using the gcloud spanner databases ddl update
command:
gcloud
spanner
databases
ddl
update
DATABASE_NAME
--
instance
=
INSTANCE_NAME
--
ddl
=
'ALTER PROTO BUNDLE UPDATE(`examples.shipping.Order`);'
--
proto
-
descriptors
-
file
=
order_descriptors
.
pb
Delete a message from a protocol buffer
If you're no longer using a message, you can delete it from the protocol buffer
with the ALTER PROTO BUNDLE
statement. For example:
ALTER
PROTO
BUNDLE
DELETE
(
examples
.
shipping
.
OrderHistory
,
)
Update the proto bundle using the gcloud spanner databases ddl update
command:
gcloud
spanner
databases
ddl
update
DATABASE_NAME
--
instance
=
INSTANCE_NAME
--
ddl
=
'ALTER PROTO BUNDLE DELETE(`examples.shipping.OrderHistory`);'
--
proto
-
descriptors
-
file
=
order_descriptors
.
pb
Query a protocol buffer
Use the dot operator to access the fields contained within a protocol buffer.
You can't use this operator to get values of fields that use the Oneof
, Any
, or Unknown Fields
type.
Example protocol buffer message
The following example queries for a table called Customers
. This table
contains a column Orders
of type PROTO
.
CREATE
TABLE
Customers
(
Id
INT64
,
Orders
examples
.
shipping
.
Order
,
)
PRIMARY
KEY
(
Id
);
The proto stored in Orders
contains fields such as the items ordered and the shipping address. The .proto
file that defines this protocol buffer might look like this:
syntax
=
"proto2"
;
package
examples
.
shipping
;
message
Order
{
optional
string
order_number
=
1
;
optional
int64
date
=
2
;
message
Address
{
optional
string
street
=
1
;
optional
string
city
=
2
;
optional
string
state
=
3
;
optional
string
country
=
4
;
}
optional
Address
shipping_address
=
3
;
message
Item
{
optional
string
product_name
=
1
;
optional
int32
quantity
=
2
;
}
repeated
Item
line_item
=
4
;
}
An instance of this message might be:
{
order_number
:
1234567
date
:
16242
shipping_address
:
{
street
:
"1234 Main St"
city
:
"AnyCity"
state
:
"AnyState"
country
:
"United States"
}
line_item
:
[
{
product_name
:
"Foo"
quantity
:
10
}
,
{
product_name
:
"Bar"
quantity
:
5
}
]
}
Query top-level fields
You can write a query to return an entire protocol buffer message, or to return a top-level or nested field of the message.
Using our example protocol buffer message, the following query returns all
protocol buffer values from the Orders
column:
SELECT
c
.
Orders
FROM
Customers
AS
c
;
This query returns the top-level field order_number
from all protocol buffer
messages in the Orders
column using the dot operator
:
SELECT
c
.
Orders
.
order_number
FROM
Customers
AS
c
;
Query nested paths
You can write a query to return a nested field of a protocol buffer message.
In the previous example, the Order
protocol buffer contains another protocol
buffer message, Address
, in the shipping_address
field. You can create a
query that returns all orders that have a shipping address in the United States:
SELECT
c
.
Orders
.
order_number
,
c
.
Orders
.
shipping_address
FROM
Customers
AS
c
WHERE
c
.
Orders
.
shipping_address
.
country
=
"United States"
;
Return repeated fields
A protocol buffer message can contain repeated fields. When referenced in a
SQL statement, the repeated fields return as ARRAY
values. For example, our
protocol buffer message contains a repeated field, line_item
.
The following query returns a set of ARRAY
s containing the line items,
each holding all the line items for one order:
SELECT
c
.
Orders
.
line_item
FROM
Customers
AS
c
;
For more information about arrays, see Work with arrays .
For more information about default field values, see Default values and NULL
s
.
Return the number of elements in an array
You can return the number of values in a repeated fields in a protocol buffer
using the ARRAY_LENGTH
function.
SELECT
c
.
Orders
.
order_number
,
ARRAY_LENGTH
(
c
.
Orders
.
line_item
)
FROM
Customers
AS
c
;
Type mapping
The following table gives examples of the mapping between various protocol buffer field types and the resulting GoogleSQL types.
Protocol buffer field type | GoogleSQL type |
---|---|
bool
|
BOOL
|
bytes
|
BYTES
|
double
|
FLOAT64
|
int64
|
INT64
|
sint64
|
INT64
|
sfixed64
|
INT64
|
string
|
STRING
|
message
|
PROTO
|
enum
|
ENUM
|
repeated
|
ARRAY
|
The following protocol buffer field types can appear in the protocol buffer definition and are written to the protocol buffer column data, but they can't be used in DML statements or queries with the field access operator. Otherwise, an unknown type error appears. They can only be accessed by reading the entire protocol buffer file:
-
float
-
int32
-
sint32
-
sfixed32
-
uint32
-
fixed32
-
uint64
-
fixed64
Default values and NULL
s
Protocol buffer messages themselves don't have a default value; only the
non-repeated fields contained inside a protocol buffer have defaults. When
the result of a query returns a full protocol buffer value, it's returned
as a blob
(a data structure that's used to store large binary data).
The result preserves all fields in the protocol buffer as they were stored,
including unset fields. This means that you can run a query that returns a
protocol buffer, and then extract fields or check field presence in your client
code with normal protocol buffer default behavior.
NULL
values aren't typically returned when accessing non-repeated leaf fields
contained in a PROTO
from within a SQL statement. However, NULL
values can
result in some cases, for example if a containing parent message is NULL
.
This behavior is consistent with the standards for handling protocol buffers
messages. If the field value isn't explicitly set on a non-repeated leaf field,
the PROTO
default value for the field is returned. A change to the default
value for a PROTO
field affects all future reads of that field using the new PROTO
definition for records where the value is unset.
For example, consider the following protocol buffer:
message
SimpleMessage
{
optional
SubMessage
message_field_x
=
1
;
optional
SubMessage
message_field_y
=
2
;
}
message
SubMessage
{
optional
string
field_a
=
1
;
optional
string
field_b
=
2
;
optional
string
field_c
=
3
;
repeated
string
field_d
=
4
;
repeated
string
field_e
=
5
;
}
Assume the following field values:
-
message_field_x
isn't set. -
message_field_y.field_a
is set to"a"
. -
message_field_y.field_b
isn't set. -
message_field_y.field_c
isn't set. -
message_field_y.field_d
is set to["d"]
. -
message_field_y.field_e
isn't set.
For this example, the following table summarizes the values produced from various accessed fields:
Accessed field | Value produced | Reason |
---|---|---|
message_field_x
|
NULL
|
Message isn't set. |
message_field_x.field_a
through message_field_x.field_e
|
NULL
|
Parent message isn't set. |
message_field_y
|
PROTO<SubMessage>{ field_a: "a" field_d: ["d"]}
|
Parent message and child fields are set. |
message_field_y.field_a
|
"a"
|
Field is set. |
message_field_y.field_b
|
""
(empty string) |
Field isn't set but parent message is set, so default value (empty string) is produced. |
message_field_y.field_c
|
NULL
|
Field isn't set and annotation indicates to not use defaults. |
message_field_y.field_d
|
["d"]
|
Field is set. |
message_field_y.field_e
|
[ ]
(empty array) |
Repeated field isn't set. |
Check if a non-repeated field has a value
You can detect whether optional
fields are set using a virtual field, has_X
,
where X
is the name of the field being checked. The type of the has_X
field
is BOOL
. The has_
field is available for any non-repeated field of a PROTO
value. This field equals true if the value X
is explicitly set in the message.
This field is useful for determining if a protocol buffer field has an explicit value, or if reads will return a default value.
Consider the following protocol buffer example, which has a field country
. You
can construct a query to determine if a Customer
protocol buffer message has a
value for the country field by using the virtual field has_country
:
message
ShippingAddress
{
optional
string
name
=
1
;
optional
string
address
=
2
;
optional
string
country
=
3
;
}
SELECT
c
.
Orders
.
shipping_address
.
has_country
FROM
Customer
AS
c
;
If has_country
is TRUE
, that means the value for the country
field has
been explicitly set. If has_country
is FALSE
, that means the parent message c.Orders.shipping_address
isn't NULL
, but the country
field hasn't been
explicitly set. If has_country
is NULL
, that means the parent message c.Orders.shipping_address
is NULL
, and therefore country
can't be
considered either set or unset.
For more information about default field values, see Default values and NULL
s
.
Check for a repeated value
You can use an EXISTS
subquery to scan inside a repeated field and check if
any value exists with some desired property. For example, the following query
returns the name of every customer who has placed an order for the product
"Foo".
SELECT
C
.
Id
FROM
Customers
AS
C
WHERE
EXISTS
(
SELECT
*
FROM
C
.
Orders
.
line_item
AS
item
WHERE
item
.
product_name
=
'Foo'
);
Nullness and nested fields
A PROTO
value may contain fields which are themselves PROTO
s. When this
happens, it's possible for the nested PROTO
to be NULL
. In such a
case, the fields contained within that nested field are also NULL
.
Consider this example proto:
syntax
=
"proto2"
;
package
examples
.
package
;
message
NestedMessage
{
optional
int64
value
=
1
;
}
message
OuterMessage
{
optional
NestedMessage
nested
=
1
;
}
Running the following query returns a 5
for value
because it is
explicitly defined.
SELECT
proto_field
.
nested
.
value
FROM
(
SELECT
CAST
(
"nested { value: 5 }"
AS
examples
.
package
.
OuterMessage
)
AS
proto_field
);
If value
isn't explicitly defined but nested
is, you get a 0
because
the annotation on the protocol buffer definition says to use default values.
SELECT
proto_field
.
nested
.
value
FROM
(
SELECT
CAST
(
"nested { }"
AS
examples
.
package
.
OuterMessage
)
AS
proto_field
);
However, if nested
isn't explicitly defined, you get a NULL
even
though the annotation says to use default values for the value
field. This is
because the containing message is NULL
. This behavior applies to both
repeated and non-repeated fields within a nested message.
SELECT
proto_field
.
nested
.
value
FROM
(
SELECT
CAST
(
""
AS
examples
.
package
.
OuterMessage
)
AS
proto_field
);
Protocol buffer fields as primary key
Using generated columns , you can define a primary key on a field in the protocol message.
If you define a primary key on a protocol message field, you can't modify or remove that field from the proto schema. For more information, see Update schemas that contain a primary key or index on a proto field .
The following is an example of a Singers
table with a SingerInfo
proto
message column. You can define a primary key on the id
field of the PROTO
column by creating a stored generated column:
CREATE
PROTO
BUNDLE
(
example
.
music
.
SingerInfo
,
example
.
music
.
SingerInfo
.
Residence
);
CREATE
TABLE
Singers
(
SingerId
INT64
NOT
NULL
AS
(
SingerInfo
.
id
)
STORED
,
SingerInfo
example
.
music
.
SingerInfo
,
...
)
PRIMARY
KEY
(
SingerId
);
It has the following definition of the SingerInfo
proto type:
package
example
.
music
;
message
SingerInfo
{
optional
int64
id
=
1
;
optional
string
name
=
2
;
optional
string
nationality
=
3
;
repeated
Residence
residence
=
4
;
message
Residence
{
required
int64
start_year
=
1
;
optional
int64
end_year
=
2
;
optional
string
city
=
3
;
optional
string
country
=
4
;
}
}
The following SQL query reads data using the previous example:
SELECT
s
.
SingerId
,
s
.
SingerInfo
.
name
FROM
Singers
AS
s
WHERE
s
.
SingerId
=
34
;
Index protocol buffer fields
Using generated columns
, you can index the fields stored in
a PROTO
column, as long as the fields being indexed are of a primitive or ENUM
data type. For more information, see Index proto fields
.
Update schemas that contain a primary key or index on a protocol buffer field
If you define a primary key or an index on a protocol message field, you can't modify or remove that field from the proto schema. This is because after you make this definition, type checking is performed every time the schema is updated. The database captures the type information for all fields in the path that are used in the primary key or index definition.
Coercion
Protocol buffers can be coerced into other data types. For more information, see Conversion rules .