Dataflow managed I/O for databases

Dataflow supports reading from and writing to databases using Java Database Connectivity (JDBC). Dataflow uses the Apache Beam JdbcIO connector to connect to JDBC data sources and sinks. Managed I/O for Databases supports standard JDBC drivers for various databases, including PostgreSQL, MySQL, and Microsoft SQL Server.

Requirements

The following SDKs support managed I/O for databases:

  • Apache Beam SDK for Java version 2.69.0 or later
  • Apache Beam SDK for Python version 2.69.0 or later

Configuration

Managed I/O for Databases supports the following configuration parameters:

POSTGRES Read

Configuration Type Description
jdbc_url
str Connection URL for the JDBC source.
connection_properties
str Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
fetch_size
int32 This method is used to override the size of the data that is going to be fetched and loaded in memory per every database call. It should ONLY be used if the default value throws memory errors.
location
str Name of the table to read from.
num_partitions
int32 The number of partitions
output_parallelization
boolean Whether to reshuffle the resulting PCollection so results are distributed to all workers.
partition_column
str Name of a column of numeric type that will be used for partitioning.
password
str Password for the JDBC source.
read_query
str SQL query used to query the JDBC source.
username
str Username for the JDBC source.

POSTGRES Write

Configuration Type Description
jdbc_url
str Connection URL for the JDBC sink.
autosharding
boolean If true, enables using a dynamically determined number of shards to write.
batch_size
int64 n/a
connection_properties
str Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
location
str Name of the table to write to.
password
str Password for the JDBC source.
username
str Username for the JDBC source.
write_statement
str SQL query used to insert records into the JDBC sink.

MYSQL Read

Configuration Type Description
jdbc_url
str Connection URL for the JDBC source.
connection_init_sql
list[ str ] Sets the connection init sql statements used by the Driver. Only MySQL and MariaDB support this.
connection_properties
str Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
disable_auto_commit
boolean Whether to disable auto commit on read. Defaults to true if not provided. The need for this config varies depending on the database platform. Informix requires this to be set to false while Postgres requires this to be set to true.
fetch_size
int32 This method is used to override the size of the data that is going to be fetched and loaded in memory per every database call. It should ONLY be used if the default value throws memory errors.
location
str Name of the table to read from.
num_partitions
int32 The number of partitions
output_parallelization
boolean Whether to reshuffle the resulting PCollection so results are distributed to all workers.
partition_column
str Name of a column of numeric type that will be used for partitioning.
password
str Password for the JDBC source.
read_query
str SQL query used to query the JDBC source.
username
str Username for the JDBC source.

MYSQL Write

Configuration Type Description
jdbc_url
str Connection URL for the JDBC sink.
autosharding
boolean If true, enables using a dynamically determined number of shards to write.
batch_size
int64 n/a
connection_init_sql
list[ str ] Sets the connection init sql statements used by the Driver. Only MySQL and MariaDB support this.
connection_properties
str Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
location
str Name of the table to write to.
password
str Password for the JDBC source.
username
str Username for the JDBC source.
write_statement
str SQL query used to insert records into the JDBC sink.

SQLSERVER Read

Configuration Type Description
jdbc_url
str Connection URL for the JDBC source.
connection_properties
str Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
disable_auto_commit
boolean Whether to disable auto commit on read. Defaults to true if not provided. The need for this config varies depending on the database platform. Informix requires this to be set to false while Postgres requires this to be set to true.
fetch_size
int32 This method is used to override the size of the data that is going to be fetched and loaded in memory per every database call. It should ONLY be used if the default value throws memory errors.
location
str Name of the table to read from.
num_partitions
int32 The number of partitions
output_parallelization
boolean Whether to reshuffle the resulting PCollection so results are distributed to all workers.
partition_column
str Name of a column of numeric type that will be used for partitioning.
password
str Password for the JDBC source.
read_query
str SQL query used to query the JDBC source.
username
str Username for the JDBC source.

SQLSERVER Write

Configuration Type Description
jdbc_url
str Connection URL for the JDBC sink.
autosharding
boolean If true, enables using a dynamically determined number of shards to write.
batch_size
int64 n/a
connection_properties
str Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;".
location
str Name of the table to write to.
password
str Password for the JDBC source.
username
str Username for the JDBC source.
write_statement
str SQL query used to insert records into the JDBC sink.

What's next

For more information and code examples, see the following documents:

Create a Mobile Website
View Site in Mobile | Classic
Share by: