The following SDKs support managed I/O for Apache Iceberg:
Apache Beam SDK for Java version 2.58.0 or later
Apache Beam SDK for Python version 2.61.0 or later
Configuration
Managed I/O for Apache Iceberg supports the following configuration
parameters:
ICEBERGRead
Configuration
Type
Description
table
str
Identifier of the Iceberg table.
catalog_name
str
Name of the catalog containing the table.
catalog_properties
map[str,str]
Properties used to set up the Iceberg catalog.
config_properties
map[str,str]
Properties passed to the Hadoop Configuration.
drop
list[str]
A subset of column names to exclude from reading. If null or empty, all columns will be read.
filter
str
SQL-like predicate to filter data at scan time. Example: "id > 5 AND status = 'ACTIVE'". Uses Apache Calcite syntax: https://calcite.apache.org/docs/reference.html
keep
list[str]
A subset of column names to read exclusively. If null or empty, all columns will be read.
ICEBERGWrite
Configuration
Type
Description
table
str
A fully-qualified table identifier. You may also provide a template to write to multiple dynamic destinations, for example: `dataset.my_{col1}_{col2.nested}_table`.
catalog_name
str
Name of the catalog containing the table.
catalog_properties
map[str,str]
Properties used to set up the Iceberg catalog.
config_properties
map[str,str]
Properties passed to the Hadoop Configuration.
drop
list[str]
A list of field names to drop from the input record before writing. Is mutually exclusive with 'keep' and 'only'.
keep
list[str]
A list of field names to keep in the input record. All other fields are dropped before writing. Is mutually exclusive with 'drop' and 'only'.
only
str
The name of a single record field that should be written. Is mutually exclusive with 'keep' and 'drop'.
partition_fields
list[str]
Fields used to create a partition spec that is applied when tables are created. For a field 'foo', the available partition transforms are:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-09-04 UTC."],[[["\u003cp\u003eManaged I/O for Apache Iceberg supports various catalogs, including Hadoop, Hive, REST-based catalogs, and BigQuery metastore, enabling batch and streaming read and write operations.\u003c/p\u003e\n"],["\u003cp\u003eWrite capabilities include batch writes, streaming writes, dynamic destinations, and dynamic table creation, providing flexibility in data management.\u003c/p\u003e\n"],["\u003cp\u003eFor BigQuery tables, the \u003ccode\u003eBigQueryIO\u003c/code\u003e connector with the BigQuery Storage API is used, but dynamic table creation is not supported.\u003c/p\u003e\n"],["\u003cp\u003eConfiguration parameters like \u003ccode\u003etable\u003c/code\u003e, \u003ccode\u003ecatalog_name\u003c/code\u003e, \u003ccode\u003ecatalog_properties\u003c/code\u003e, \u003ccode\u003econfig_properties\u003c/code\u003e, and \u003ccode\u003etriggering_frequency_seconds\u003c/code\u003e allow for customization of Apache Iceberg operations.\u003c/p\u003e\n"],["\u003cp\u003eThe usage of this feature requires Apache Beam SDK for Java version 2.58.0 or later, while using the BigQuery Metastore requires 2.62.0 or later if not using Runner V2.\u003c/p\u003e\n"]]],[],null,["# Dataflow managed I/O for Apache Iceberg\n\n[Managed I/O](/dataflow/docs/guides/managed-io) supports the following\ncapabilities for Apache Iceberg:\n\nFor [BigQuery tables for Apache Iceberg](/bigquery/docs/iceberg-tables),\nuse the\n[`BigQueryIO` connector](https://beam.apache.org/documentation/io/built-in/google-bigquery/)\nwith BigQuery Storage API. The table must already exist; dynamic table creation is\nnot supported.\n\nRequirements\n------------\n\nThe following SDKs support managed I/O for Apache Iceberg:\n\n- Apache Beam SDK for Java version 2.58.0 or later\n- Apache Beam SDK for Python version 2.61.0 or later\n\nConfiguration\n-------------\n\nManaged I/O for Apache Iceberg supports the following configuration\nparameters:\n\n### `ICEBERG` Read\n\n\u003cbr /\u003e\n\n### `ICEBERG` Write\n\n\u003cbr /\u003e\n\n\u003cbr /\u003e\n\nWhat's next\n-----------\n\nFor more information and code examples, see the following topics:\n\n- [Read from Apache Iceberg](/dataflow/docs/guides/read-from-iceberg)\n- [Write to Apache Iceberg](/dataflow/docs/guides/write-to-iceberg)"]]