Google Cloud Lakehouse metastore is a fully managed, serverless service that provides a single source of truth for your data lakehouse. It enables multiple engines, including Apache Spark, Apache Flink, and BigQuery, to share tables and metadata without copying files.
Google Cloud Lakehouse metastore supports storage access delegation (credential vending), which improves security by removing the need for direct Cloud Storage bucket access. It also integrates with Knowledge Catalog for unified governance, lineage, and data quality.
Key capabilities
As a component of Google Cloud Lakehouse, Google Cloud Lakehouse metastore provides several advantages for data management and analysis, including a serverless architecture, engine interoperability with open APIs, a unified user experience, and high-performance analytics, streaming, and AI when used with BigQuery. For more information on these benefits, see What is Lakehouse?
Supported engines
Google Cloud Lakehouse metastore is compatible with several query engines including (but not limited to) Apache Spark, Apache Flink, and Trino. The following table provides links to documentation for each engine:
| Engine | Documentation |
|---|---|
| Apache Spark | Quickstart: Use with Spark |
| Apache Flink | Use with Apache Flink |
| Trino | Use with Trino |
Configuration options
Google Cloud Lakehouse metastore can be configured in one of two ways: with the Iceberg REST catalog or the custom Iceberg catalog for BigQuery . The best option depends on your use case, as shown in the following table:
| Use case | Recommendation |
|---|---|
| New Google Cloud Lakehouse metastore users that want their open source engine to access data in Cloud Storage and need interoperability with other engines, including BigQuery and AlloyDB for PostgreSQL. | Use the Iceberg REST catalog . |
| Existing Google Cloud Lakehouse metastore users that have current tables with the custom Iceberg catalog for BigQuery. | Continue using the custom Iceberg catalog for BigQuery , but use the Iceberg REST catalog for new workflows. Tables created with the custom Iceberg catalog for BigQuery are visible with the Iceberg REST catalog through BigQuery catalog federation. |
Differences with Google Cloud Lakehouse metastore (classic)
Google Cloud Lakehouse metastore is the recommended metastore on Google Cloud, while Google Cloud Lakehouse metastore (classic) is considered a legacy feature.
The core differences between Google Cloud Lakehouse metastore and Google Cloud Lakehouse metastore (classic) include the following:
- Google Cloud Lakehouse metastore supports a direct integration with open source engines like Spark, which helps reduce redundancy when you store metadata and run jobs. Tables in Google Cloud Lakehouse metastore are directly accessible from multiple open source engines and BigQuery.
- Google Cloud Lakehouse metastore supports the Iceberg REST catalog, while Google Cloud Lakehouse metastore (classic) does not.
Google Cloud Lakehouse metastore limitations
The following limitations apply to tables in Google Cloud Lakehouse metastore:
Table management
- You can't create or modify Google Cloud Lakehouse Iceberg tables with BigQuery data definition language (DDL) or data manipulation language (DML) statements. You can modify Google Cloud Lakehouse Iceberg tables using the BigQuery API (with the bq command-line tool or client libraries), but doing so risks making changes that are incompatible with the external engine.
- Google Cloud Lakehouse metastore tables don't support renaming
operations
or the
ALTER TABLE ... RENAME TOSpark SQL statement. - Google Cloud Lakehouse metastore tables don't support clustering .
- Google Cloud Lakehouse metastore tables don't support flexible column names .
- Google Cloud Lakehouse metastore doesn't support Iceberg views.
Querying
- Query performance for Google Cloud Lakehouse metastore tables from the BigQuery engine might be slow compared to querying data in standard BigQuery tables. In general, query speed should be equivalent to reading data from Cloud Storage.
- A BigQuery dry run of a query that uses a Google Cloud Lakehouse metastore table might report a lower bound of 0 bytes of data, even if rows are returned. This result occurs because the amount of data that is processed from the table can't be determined until the full query is run. Running the query incurs a cost for processing this data.
- You can't reference a Google Cloud Lakehouse metastore table in a wildcard table query.
API and metadata
- You can't use the
tabledata.listmethod to retrieve data from Google Cloud Lakehouse metastore tables. Instead, you can save query results to a BigQuery table, and then use thetabledata.listmethod on that table. - The display of table storage statistics for Google Cloud Lakehouse metastore tables isn't supported.
Quotas and limits
- Google Cloud Lakehouse metastore tables in BigQuery are subject to the same quotas and limits as standard tables.
What's next
- Explore the Iceberg REST catalog .
- Explore the custom Iceberg catalog for BigQuery .

