Dataflow managed I/O for databases

Dataflow supports reading from and writing to databases using Java Database Connectivity (JDBC). Dataflow uses the Apache Beam JdbcIO connector to connect to JDBC data sources and sinks. Managed I/O for Databases supports standard JDBC drivers for various databases, including PostgreSQL, MySQL, and Microsoft SQL Server.

Requirements

The following SDKs support managed I/O for databases:

Apache Beam SDK for Java version 2.69.0 or later
Apache Beam SDK for Python version 2.69.0 or later

Configuration

Managed I/O for Databases supports the following configuration parameters:

`POSTGRES` Read

Configuration	Type	Description
jdbc_url	`str`	Connection URL for the JDBC source.
connection_properties	`str`	Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
fetch_size	`int32`	This method is used to override the size of the data that is going to be fetched and loaded in memory per every database call. It should ONLY be used if the default value throws memory errors.
location	`str`	Name of the table to read from.
num_partitions	`int32`	The number of partitions
output_parallelization	`boolean`	Whether to reshuffle the resulting PCollection so results are distributed to all workers.
partition_column	`str`	Name of a column of numeric type that will be used for partitioning.
password	`str`	Password for the JDBC source.
read_query	`str`	SQL query used to query the JDBC source.
username	`str`	Username for the JDBC source.

`POSTGRES` Write

Configuration	Type	Description
jdbc_url	`str`	Connection URL for the JDBC sink.
autosharding	`boolean`	If true, enables using a dynamically determined number of shards to write.
batch_size	`int64`	n/a
connection_properties	`str`	Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
location	`str`	Name of the table to write to.
password	`str`	Password for the JDBC source.
username	`str`	Username for the JDBC source.
write_statement	`str`	SQL query used to insert records into the JDBC sink.

`MYSQL` Read

Configuration	Type	Description
jdbc_url	`str`	Connection URL for the JDBC source.
connection_init_sql	`list[ str ]`	Sets the connection init sql statements used by the Driver. Only MySQL and MariaDB support this.
connection_properties	`str`	Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
disable_auto_commit	`boolean`	Whether to disable auto commit on read. Defaults to true if not provided. The need for this config varies depending on the database platform. Informix requires this to be set to false while Postgres requires this to be set to true.
fetch_size	`int32`	This method is used to override the size of the data that is going to be fetched and loaded in memory per every database call. It should ONLY be used if the default value throws memory errors.
location	`str`	Name of the table to read from.
num_partitions	`int32`	The number of partitions
output_parallelization	`boolean`	Whether to reshuffle the resulting PCollection so results are distributed to all workers.
partition_column	`str`	Name of a column of numeric type that will be used for partitioning.
password	`str`	Password for the JDBC source.
read_query	`str`	SQL query used to query the JDBC source.
username	`str`	Username for the JDBC source.

`MYSQL` Write

Configuration	Type	Description
jdbc_url	`str`	Connection URL for the JDBC sink.
autosharding	`boolean`	If true, enables using a dynamically determined number of shards to write.
batch_size	`int64`	n/a
connection_init_sql	`list[ str ]`	Sets the connection init sql statements used by the Driver. Only MySQL and MariaDB support this.
connection_properties	`str`	Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
location	`str`	Name of the table to write to.
password	`str`	Password for the JDBC source.
username	`str`	Username for the JDBC source.
write_statement	`str`	SQL query used to insert records into the JDBC sink.

`SQLSERVER` Read

Configuration	Type	Description
jdbc_url	`str`	Connection URL for the JDBC source.
connection_properties	`str`	Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
disable_auto_commit	`boolean`	Whether to disable auto commit on read. Defaults to true if not provided. The need for this config varies depending on the database platform. Informix requires this to be set to false while Postgres requires this to be set to true.
fetch_size	`int32`	This method is used to override the size of the data that is going to be fetched and loaded in memory per every database call. It should ONLY be used if the default value throws memory errors.
location	`str`	Name of the table to read from.
num_partitions	`int32`	The number of partitions
output_parallelization	`boolean`	Whether to reshuffle the resulting PCollection so results are distributed to all workers.
partition_column	`str`	Name of a column of numeric type that will be used for partitioning.
password	`str`	Password for the JDBC source.
read_query	`str`	SQL query used to query the JDBC source.
username	`str`	Username for the JDBC source.

`SQLSERVER` Write

Configuration	Type	Description
jdbc_url	`str`	Connection URL for the JDBC sink.
autosharding	`boolean`	If true, enables using a dynamically determined number of shards to write.
batch_size	`int64`	n/a
connection_properties	`str`	Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
location	`str`	Name of the table to write to.
password	`str`	Password for the JDBC source.
username	`str`	Username for the JDBC source.
write_statement	`str`	SQL query used to insert records into the JDBC sink.

What's next

For more information and code examples, see the following documents:

Dataflow managed I/O for databases Stay organized with collections Save and categorize content based on your preferences.

Requirements

Configuration

POSTGRES Read

POSTGRES Write

MYSQL Read

MYSQL Write

SQLSERVER Read

SQLSERVER Write