Work with ObjectRef values
This document describes ObjectRef
values and how to create and use them in
BigQuery.
An ObjectRef
value is a STRUCT
type
with a predefined schema that references Cloud Storage objects for multimodal analysis
. It can be
processed by OBJ
functions
, AI functions
, or Python user-defined functions
.
Schema
An ObjectRef
value has the following fields:
| Name | Type | Mode | Description | Example |
|---|---|---|---|---|
uri
|
STRING
|
REQUIRED
|
The Cloud Storage object's URI. | "gs://cloud-samples-data/vision/demo-img.jpg"
|
version
|
STRING
|
NULLABLE
|
The object generation. | "1560286006357632"
|
authorizer
|
STRING
|
NULLABLE
|
A BigQuery connection ID for delegated
access
or NULL
for direct access
.
The ID can have the following formats:"region.connection"
or "project.region.connection"
|
"myproject.us.myconnection"
|
details
|
JSON
|
NULLABLE
|
The object metadata or errors from processing the object. It can include the
fields content_type
, md5_hash
, size
, and updated
for the object
. |
{"gcs_metadata":{"content_type":"image/png","md5_hash":"dfbbb5cf034af026d89f2dc16930be15","size":915052,"updated":1560286006000000}}
|
The content_type
field in the gcs_metadata
field from the details
column is fetched from Cloud Storage. You can set an object's content type
in
Cloud Storage. If you omit it in Cloud Storage, then
BigQuery infers the content type from the suffix of the URI.
Create ObjectRef
values
You can create ObjectRef
values by using object tables
, the OBJ.MAKE_REF
function
,
or Cloud Storage Insights datasets
.
Use object tables
Use an object table if you don't have URIs stored in a table and want to list
all the objects from a Cloud Storage prefix. An object table stores
the reference to an object in each row, and has a ref
column that contains ObjectRef
values. The following query uses the CREATE EXTERNAL TABLE
statement
to create an object table:
CREATE
EXTERNAL
TABLE
mydataset
.
images
WITH
CONNECTION
`us.myconnection`
OPTIONS
(
uris
=[
"gs://mybucket/images/*"
]
,
object_metadata
=
"SIMPLE"
);
SELECT
ref
AS
image_ref
FROM
mydataset
.
images
;
ObjectRef
values from an object table must have an authorizer for delegated access
. The authorizer connection is the same
connection that you use to create the object table.
Use the OBJ.MAKE_REF
function
Use the OBJ.MAKE_REF
function if you already have URIs stored in a table and want to create ObjectRef
values from those URIs. The following queries show how
to create ObjectRef
values in the image_ref
column from the uri
column
that contains Cloud Storage URIs:
-- Specify only the URI
SELECT
*
,
OBJ
.
MAKE_REF
(
uri
)
AS
image_ref
FROM
mydataset
.
images
;
-- Specify the URI and the connection
SELECT
*
,
OBJ
.
MAKE_REF
(
uri
,
"us.myconnection"
)
AS
image_ref
FROM
mydataset
.
images
;
To modify the authorizers of an existing ObjectRef
value, you can
use the OBJ.MAKE_REF
function:
-- Remove the authorizer
SELECT
*
,
OBJ
.
MAKE_REF
(
ref
,
authorizer
=
> NULL
)
AS
image_ref
FROM
mydataset
.
images
;
-- Change the authorizer
SELECT
*
,
OBJ
.
MAKE_REF
(
ref
,
authorizer
=
> "us.myconnection2"
)
AS
image_ref
FROM
mydataset
.
images
;
The OBJ.MAKE_REF
function accepts a nullable authorizer to support direct access
and delegated access
.
Use Cloud Storage Insights datasets
If you have a Storage Insights dataset configured
,
then the dataset already includes a ref
column
that contains ObjectRef
values. Any ObjectRef
values created in Storage
Insights datasets don't have an authorizer. To query these objects, you must
either have direct access
to the object or add an authorizer
to the ObjectRef
to use delegated access
.
Authorizer and permissions
When you pass an ObjectRef
value to ObjectRef functions, AI functions, or
Python UDFs, those functions need to access the object stored in
Cloud Storage. You can authorize this access based
on the value of the authorizer
field in two ways: direct access and delegated
access.
Direct access
With direct access
, the user who runs the query accesses the object directly
by using their own credentials. Direct access is used when the ObjectRef
value has no authorizer.
Direct access has the following restrictions:
- The user must have permission to access the objects.
- A query job using the
AI.GENERATE,AI.IF,AI.SCORE, orAI.CLASSIFYfunctions without a connection requires the user to have additional permissions . The query can only access Cloud Storage buckets and objects from the same project in which the job is executed.
For example, if you call the AI.GENERATE
function on an ObjectRef
value
that doesn't have an authorizer, then the function reads the object as you. If you
don't have permission to read the object, the function writes a "permission denied"
error to the status
column in the result.
The following example shows a query that uses direct access:
-- Requires that the end user can read the object "gs://cloud-samples-data/vision/demo-img.jpg" and use the Vertex AI model.
SELECT
AI
.
GENERATE
(
(
"Describe this image:"
,
OBJ
.
GET_ACCESS_URL
(
OBJ
.
MAKE_REF
(
"gs://cloud-samples-data/vision/demo-img.jpg"
),
'r'
)));
Delegated access
With delegated access
, the user who runs the query delegates object access
to a BigQuery Cloud resource connection
,
which is specified in the authorizer
field of the ObjectRef
value.
Delegated access can enable cross-project data access.
To use delegated access, your data administrator must follow these steps to set up the connection and permissions:
- One-time setup. The data administrator must set up a Cloud resource connection
to manage the Cloud Storage bucket:
- Create a new BigQuery Cloud resource connection or reuse an existing one in the project.
- Look up the service account in the connection's metadata.
- Grant the service account the
storage.objects.getpermission for reads, or thestorage.objects.createpermission for writes, in either the project or the Cloud Storage buckets. You can grant these permissions with the Storage Object Viewer or Storage Object User roles.
- Per-user setup. The data administrator must grant users the
bigquery.objectRefs.readpermission for reads, or thebigquery.objectRefs.writepermission for writes, to the BigQuery connection. You can grant these permissions with the BigQuery ObjectRef Reader or BigQuery ObjectRef Admin roles.
For example, if a user passes ObjectRef
values that have an authorizer to an AI.GENERATE
function, then the function verifies that the user has the bigquery.objectRefs.read
permission, and then reads the objects by using the
connection's service account. If the user or the service account has
insufficient permissions, then the function writes a "permission denied"
error
to the status
column in the result.
The following example shows a query that uses delegated access. It requires the following:
- The user has the
bigquery.objectRefs.readpermission onconnection1. - The service account for
connection1has thestorage.objects.getpermission on the object. - The service account for
connection2has the Vertex AI User role.
SELECT
AI
.
GENERATE
(
(
"Describe this image:"
,
OBJ
.
GET_ACCESS_RUL
(
OBJ
.
MAKE_REF
(
"gs://cloud-samples-data/vision/demo-img.jpg"
,
"us.connection1"
),
'r'
)),
connection_id
=
>
"us.connection2"
);
Best practices
Consider the following best practices when you decide whether to use direct or delegated access:
- Use direct access
for a small team operating in a single project for
both data storage and analysis. The data administrator uses Identity and Access Management to
grant users access to BigQuery data and
Cloud Storage data. Users can create
ObjectRefvalues on demand without an authorizer to analyze objects by using their own credentials. - Use delegated access
for a large team operating across multiple
projects, especially when data storage and analysis are decoupled. The data
administrator can set up connections and create
ObjectRefvalues for analysis ahead of time with a connection as their authorizer. This approach works with object tables or by usingOBJ.MAKE_REFon a list of URIs. Then, the data administrator can share the table storing theObjectRefvalues with analysts. The analysts don't need to access the original bucket to analyze the objects.
Errors
Functions that consume ObjectRef
values report errors in two ways:
- Query failure: the query might fail with an error message and no result.
- Returned error values: the query succeeds, but the function might write errors as a part of the return value. For information about the format of the return value, see the reference page for the function you are using.
When a function returns an ObjectRef
value, the details
field of that
value might contain an errors
field. If it does, the value of that field is
an array of errors. Each error has the following schema:
| Name | Type | Mode | Description | Example |
|---|---|---|---|---|
code
|
INT64
|
REQUIRED
|
Standard HTTP error code. | 400
|
message
|
STRING
|
REQUIRED
|
A descriptive, user-friendly error message. | "Connection credential for myproject.us.nonexistent_connection cannot be used. Either the connection does not exist, or the user does not have sufficient permissions (bigquery.objectRefs.read)"
|
source
|
STRING
|
REQUIRED
|
The name of the function that triggered the error. | "OBJ.MAKE_REF"
|
These are two common types of errors:
- Object error: the object URI or version provided doesn't exist.
- Authorizer error: the connection doesn't exist or the user has no permission to use it for delegated access .
The following query shows how to select ObjectRef
values that contain errors
from an Objectref
column
:
SELECT
ref
FROM
mydataset
.
images
WHERE
ref
.
details
.
errors
IS
NOT
NULL
;
What's next
- Specify
ObjectRefcolumns in table schemas . - Analyze multimodal data .
- Learn more about ObjectRef functions .

