Import data from external Iceberg catalogs to Google Cloud Lakehouse using Dataflow

Your use case might require you to connect an external Iceberg REST Catalog (IRC) table to an existing Google Cloud Lakehouse table. Dataflow's job builder UI lets you build a pipeline that migrates your external open source Iceberg catalog tables into Lakehouse in a low-code or no-code way. This process lets you consolidate data into a unified Lakehouse-managed Iceberg format for cross-engine analytics.

Use the following connection details to import data from external Iceberg catalogs.

Before you begin

To import data, you need the following:

  1. Connection information for the external Iceberg REST Catalog. For example: catalog name, namespace, table name, account URI, and role to access the catalog.
  2. A Lakehouse Iceberg catalog, namespace, and table to import the data into.

Support and limitations

Importing data from external Iceberg catalogs to Google Cloud Lakehouse using Dataflow has the following limitations:

  • This feature supports reading from externally available Iceberg providers that support IRC (Iceberg Rest Catalog) into Lakehouse. Other Iceberg catalog types aren't supported.
  • This feature supports batch and streaming pipelines.

Import an external Iceberg catalog table

To import an external Iceberg catalog table into Google Cloud Lakehouse, complete the following steps:

  1. In the Google Cloud console, go to the Google Cloud Lakehouse Metastorepage.

    Go to Google Cloud Lakehouse Metastore

  2. Select the catalog, namespace, and table you want to import data into.

  3. On the Table detailspage, click Import table.

  4. In the Import configurationdialog, select Import a table from an Apache Iceberg REST Catalog into Lakehouse (Batch).

    The Dataflow Job builderpage opens.

  5. In the Sourcessection:

    1. To expand the Iceberg tablesource panel, click the expander arrow.

    2. In the Iceberg tablefield, enter the identifier of the Apache Iceberg table.

    3. In the Catalog namefield, enter the name of the catalog.

    4. In the Filterfield, enter the Iceberg filter to use. For example, id > 5 .

    5. Optional: To specify source table column changes, use the Keep columnsor Drop columnssections.

    6. In the Catalog typelist of the Catalog propertiessection, select the type of catalog.

    7. In the Catalog URIfield, enter the URI of the catalog. For example, http://localhost:8181 .

    8. In the Warehouse namefield, enter the catalog name.

      For some external Iceberg REST Catalog providers, the warehouse is abstracted, and the catalog name is provided as the warehouse name.

    9. In the Authentication typelist, select the authentication type. For example, OAUTH2 .

  6. Optional: In the Transformssection, add any transforms to the source data.

  7. In the Sinksection:

    1. Optional: Review the Lakehouse tablesink panel. The information in this panel, such as the Lakehouse table, catalog name, and warehouse location, is typically prepopulated.
  8. In the Dataflow optionssection, click Run job.

What's next

Create a Mobile Website
View Site in Mobile | Classic
Share by: