Managed I/O supports the following capabilities for Apache Iceberg:
- Hadoop
- Hive
- REST-based catalogs
- BigQuery metastore (requires Apache Beam SDK 2.62.0 or later if not using Runner v2)
- Batch write
- Streaming write
- Dynamic destinations
- Dynamic table creation
For BigQuery tables for Apache Iceberg
,
use the BigQueryIO
connector
with BigQuery Storage API. The table must already exist; dynamic table creation is
not supported.
Requirements
The following SDKs support managed I/O for Apache Iceberg:
- Apache Beam SDK for Java version 2.58.0 or later
- Apache Beam SDK for Python version 2.61.0 or later
Configuration
Managed I/O for Apache Iceberg supports the following configuration parameters:
ICEBERG
Read
| Configuration | Type | Description |
|---|---|---|
|
table
|
str
|
Identifier of the Iceberg table. |
|
catalog_name
|
str
|
Name of the catalog containing the table. |
|
catalog_properties
|
map[ str
, str
]
|
Properties used to set up the Iceberg catalog. |
|
config_properties
|
map[ str
, str
]
|
Properties passed to the Hadoop Configuration. |
|
drop
|
list[ str
]
|
A subset of column names to exclude from reading. If null or empty, all columns will be read. |
|
filter
|
str
|
SQL-like predicate to filter data at scan time. Example: "id > 5 AND status = 'ACTIVE'". Uses Apache Calcite syntax: https://calcite.apache.org/docs/reference.html |
|
keep
|
list[ str
]
|
A subset of column names to read exclusively. If null or empty, all columns will be read. |
ICEBERG
Write
str
str
map[ str
, str
]
map[ str
, str
]
int32
list[ str
]
list[ str
]
str
list[ str
]
-
foo -
truncate(foo, N) -
bucket(foo, N) -
hour(foo) -
day(foo) -
month(foo) -
year(foo) -
void(foo)
For more information on partition transforms, please visit https://iceberg.apache.org/spec/#partition-transforms .
map[ str
, str
]
int32
What's next
For more information and code examples, see the following topics:

