Create and manage resources in the Lakehouse runtime catalog

The Lakehouse runtime catalog is a single, shared catalog that enables data sharing across data processing engines, eliminating the need to maintain separate catalogs for your open source workloads.

This document explains how to create, view, modify, and delete resources in the Lakehouse runtime catalog.

Before you begin

  1. Verify that billing is enabled for your Google Cloud project .

  2. Enable the BigQuery, BigQuery Storage, and Managed Service for Apache Spark APIs.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role ( roles/serviceusage.serviceUsageAdmin ), which contains the serviceusage.services.enable permission. Learn how to grant roles .

    Enable the APIs

Required roles

To get the permissions that you need to manage Apache Iceberg resources in the Lakehouse runtime catalog, ask your administrator to grant you the following IAM roles on your project:

For more information about granting roles, see Manage access to projects, folders, and organizations .

You might also be able to get the required permissions through custom roles or other predefined roles .

Create catalog resources

The following sections describe how to create resources in the Lakehouse runtime catalog.

Create namespaces

Select one of the following options:

API

Use the datasets.insert method , and specify the ExternalCatalogDatasetOptions field in the dataset resource that you pass in.

{
  "datasetReference": {
    "projectId": " PROJECT_ID 
",
    "datasetId": " DATASET_ID 
"
  },
  "externalCatalogDatasetOptions": {
    "defaultStorageLocationUri": " URI 
",
    "parameters": {
      ...
    }
  },
  "location": " LOCATION 
"
}

Replace the following:

  • PROJECT_ID : the ID of the project that contains your target dataset.
  • DATASET_ID : the ID of your target dataset
  • URI : the Cloud Storage URI for all tables in the dataset.
  • LOCATION : the BigQuery location that you want to create the dataset in.

Apache Spark SQL

 CREATE 
  
 NAMESPACE 
  
  SPARK_CATALOG 
 
 . 
  NAMESPACE 
 
 ; 

Replace the following:

  • SPARK_CATALOG : the name of your Apache Spark catalog.
  • NAMESPACE : the name of your new namespace.

Terraform

 provider 
  
 "google" 
  
 { 
  
 project 
  
 = 
  
 " PROJECT_ID 
" 
 } 
 resource 
  
 "google_bigquery_dataset" 
  
 "default" 
  
 { 
  
 dataset_id 
  
 = 
  
 " DATASET_ID 
" 
  
 location 
  
 = 
  
 " LOCATION 
" 
  
 external_catalog_dataset_options 
  
 { 
  
 default_storage_location_uri 
  
 = 
  
 " URI 
" 
  
 parameters 
  
 = 
  
 { 
  
 ... 
  
 } 
  
 } 
 } 

Replace the following:

  • PROJECT_ID : the ID of the project that contains your target dataset.
  • DATASET_ID : the ID of your target dataset.
  • LOCATION : the BigQuery location that you want to create the dataset in.
  • URI : the Cloud Storage URI for all tables in the dataset.

Create Apache Iceberg tables

Select one of the following options:

API

Use the tables.insert method , and specify the ExternalCatalogTableOptions field in the table resource that you pass in.

 { 
  
 "tableReference" 
 : 
  
 { 
  
 "projectId" 
 : 
  
 " PROJECT_ID 
" 
 , 
  
 "datasetId" 
 : 
  
 " DATASET_ID 
" 
 , 
  
 "tableId" 
 : 
  
 " TABLE_ID 
" 
  
 }, 
  
 "externalCatalogTableOptions" 
 : 
  
 { 
  
 "parameters" 
 : 
  
 { 
  
 "table_type" 
 : 
  
 "iceberg" 
 , 
  
 "metadata_location" 
 : 
  
 " METADATA_URI 
" 
  
 }, 
  
 "connection_id" 
 : 
  
 " CONNECTION_ID 
" 
  
 } 
 } 

Replace the following:

  • PROJECT_ID : the ID of the project that contains your target table.
  • DATASET_ID : the ID of the dataset that contains your target table.
  • TABLE_ID : the ID of your target table.
  • METADATA_URI : the Cloud Storage URI for the latest Apache Iceberg metadata file. For example, gs://mybucket/mytable/metadata/1234.metadata.json .
  • CONNECTION_ID : the ID of your connection to Cloud Storage.

Apache Spark SQL

 CREATE 
  
 TABLE 
  
  SPARK_CATALOG 
 
 . 
  NAMESPACE 
 
 . 
  TABLE 
 
  
 ( 
 id 
  
 bigint 
 , 
  
 data 
  
 string 
 ) 
  
 USING 
  
 iceberg 
 ; 

Replace the following:

  • SPARK_CATALOG : the name of your Apache Spark catalog
  • NAMESPACE : the name of your namespace
  • TABLE : the name of your new table

Terraform

 resource 
  
 "google_bigquery_table" 
  
 "default" 
  
 { 
  
 deletion_protection 
  
 = 
  
 false 
  
 dataset_id 
  
 = 
  
 google_bigquery_dataset.default.dataset_id 
  
 table_id 
  
 = 
  
 " TABLE 
" 
  
 external_catalog_table_options 
  
 { 
  
 storage_descriptor 
  
 { 
  
 location_uri 
  
 = 
  
 " STORAGE_URI 
" 
  
 input_format 
  
 = 
  
 "org.apache.hadoop.mapred.FileInputFormat" 
  
 output_format 
  
 = 
  
 "org.apache.hadoop.mapred.FileOutputFormat" 
  
 } 
  
 parameters 
  
 = 
  
 { 
  
 "table_type" 
  
 = 
  
 "iceberg" 
  
 "metadata_location" 
  
 = 
  
 " METADATA_URI 
" 
  
 "write.parquet.compression-codec" 
  
 : 
  
 "zstd" 
  
 "EXTERNAL" 
  
 : 
  
 "TRUE" 
  
 } 
  
 } 
 } 

Replace the following:

  • TABLE : the name of the target table.
  • STORAGE_URI : the Cloud Storage URI where the table data is stored, starting with gs:// .
  • METADATA_URI : the Cloud Storage URI for the latest Apache Iceberg metadata file. For example, gs://mybucket/mytable/metadata/1234.metadata.json .

View catalog resources

The following sections describe how to view resources in the Lakehouse runtime catalog.

View namespaces

Select one of the following options:

API

Use the datasets.list method to view all namespaces, or use the datasets.get method to view information about a defined namespace.

Apache Spark SQL

To view all namespaces in a catalog, use the following statement:

 SHOW 
  
 { 
  
 DATABASES 
  
 | 
  
 NAMESPACES 
  
 } 
  
 IN 
  
  SPARK_CATALOG 
 
 ; 

Replace SPARK_CATALOG with the name of your Apache Spark catalog.

To view information about a defined namespace, use the following statement:

 DESCRIBE 
  
 { 
  
 DATABASE 
  
 | 
  
 NAMESPACE 
  
 } 
  
 [ 
 EXTENDED 
 ] 
  SPARK_CATALOG 
 
 . 
  NAMESPACE 
 
 ; 

Replace the following:

  • SPARK_CATALOG : the name of your Apache Spark catalog.
  • NAMESPACE : the name of your namespace.

View tables

Select one of the following options:

API

Use the tables.list method to view all tables in a namespace, or use the tables.get method to view information about a defined table.

Apache Spark SQL

To view all tables in a namespace, use the following statement:

 SHOW 
  
 TABLES 
  
 IN 
  
  SPARK_CATALOG 
 
 . 
  NAMESPACE 
 
 ; 

Replace the following:

  • SPARK_CATALOG : the name of your Apache Spark catalog.
  • NAMESPACE : the name of your namespace.

To view information about a defined table, use the following statement:

 DESCRIBE 
  
 TABLE 
  
 [ 
 EXTENDED 
 ] 
  SPARK_CATALOG 
 
 . 
  NAMESPACE 
 
 . 
  TABLE 
 
 ; 

Replace the following:

  • SPARK_CATALOG : the name of your Apache Spark catalog.
  • NAMESPACE : the name of your namespace.
  • TABLE : the name of your table.

Modify catalog resources

The following sections describe how to modify resources in the Lakehouse runtime catalog.

Update namespaces

Select one of the following options:

API

Use the datasets.patch method , and update the ExternalCatalogDatasetOptions field in the dataset resource . The datasets.update method is not recommended because it replaces the entire dataset resource.

Apache Spark SQL

Use the ALTER DATABASE statement .

Update Apache Iceberg tables

Select one of the following options:

API

Use the tables.patch method , and update the ExternalCatalogTableOptions field in the table resource . The tables.update method is not recommended because it replaces the entire table resource.

To update the schema or metadata file, use the tables.patch method and set the autodetect_schema property to true:

PATCH https://bigquery.googleapis.com/bigquery/v2/projects/ PROJECT_ID 
/datasets/ DATASET_ID 
/tables/ TABLE_ID 
?autodetect_schema=true

Replace the following:

  • PROJECT_ID : the ID of the project that contains the table that you want to update.
  • DATASET_ID : the ID of the dataset that contains the table that you want to update.
  • TABLE_ID : the ID of the table that you want to update.

In the body of the request, specify the updated value for each field. For example, to update the Apache Iceberg table's metadata location, specify the updated value for the metadata_location field:

{
  "externalCatalogTableOptions": {
    "parameters": {"metadata_location": " METADATA_URI 
"}
  },
  "schema": null
}'

Replace METADATA_URI with the Cloud Storage URI for the latest Apache Iceberg metadata file. For example, gs://mybucket/mytable/metadata/1234.metadata.json .

Apache Spark SQL

Use the ALTER TABLE statement .

Delete catalog resources

The following sections describe how to delete resources in the Lakehouse runtime catalog.

Delete namespaces

Select one of the following options:

API

Use the datasets.delete method . Set the deleteContents parameter to true to delete the tables in your namespace.

Apache Spark SQL

 DROP 
  
 NAMESPACE 
  
  SPARK_CATALOG 
 
 . 
  NAMESPACE 
 
 ; 

Replace the following:

  • SPARK_CATALOG : the name of your Apache Spark catalog.
  • NAMESPACE : the name of your namespace.

Delete tables

Select one of the following options:

API

Use the tables.delete method and specify the name of the table. This method doesn't delete the associated files in Cloud Storage.

Apache Spark SQL

To only drop the table, use the following statement:

 DROP 
  
 TABLE 
  
  SPARK_CATALOG 
 
 . 
  NAMESPACE 
 
 . 
  TABLE 
 
 ; 

Replace the following:

  • SPARK_CATALOG : the name of your Apache Spark catalog.
  • NAMESPACE : the name of your namespace.
  • TABLE : the name of the table to drop.

To drop the table and delete the associated files in Cloud Storage, use the following statement:

 DROP 
  
 TABLE 
  
  SPARK_CATALOG 
 
 . 
  NAMESPACE 
 
 . 
  TABLE 
 
  
 PURGE 
 ; 

Replace the following:

  • SPARK_CATALOG : the name of your Apache Spark catalog.
  • NAMESPACE : the name of your namespace.
  • TABLE : the name of the table to delete.

What's next

Design a Mobile Site
View Site in Mobile | Classic
Share by: