Dataflow supports reading from and writing to databases using
Java Database Connectivity (JDBC). Dataflow uses the Apache Beam JdbcIO
connector to connect to JDBC data sources and
sinks. Managed I/O for Databases supports standard JDBC drivers
for various databases, including PostgreSQL, MySQL, and Microsoft SQL Server.
Requirements
The following SDKs support managed I/O for databases:
- Apache Beam SDK for Java version 2.69.0 or later
- Apache Beam SDK for Python version 2.69.0 or later
Configuration
Managed I/O for Databases supports the following configuration parameters:
POSTGRES
Read
| Configuration | Type | Description |
|---|---|---|
|
jdbc_url
|
str
|
Connection URL for the JDBC source. |
|
connection_properties
|
str
|
Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;". |
|
fetch_size
|
int32
|
This method is used to override the size of the data that is going to be fetched and loaded in memory per every database call. It should ONLY be used if the default value throws memory errors. |
|
location
|
str
|
Name of the table to read from. |
|
num_partitions
|
int32
|
The number of partitions |
|
output_parallelization
|
boolean
|
Whether to reshuffle the resulting PCollection so results are distributed to all workers. |
|
partition_column
|
str
|
Name of a column of numeric type that will be used for partitioning. |
|
password
|
str
|
Password for the JDBC source. |
|
read_query
|
str
|
SQL query used to query the JDBC source. |
|
username
|
str
|
Username for the JDBC source. |
POSTGRES
Write
| Configuration | Type | Description |
|---|---|---|
|
jdbc_url
|
str
|
Connection URL for the JDBC sink. |
|
autosharding
|
boolean
|
If true, enables using a dynamically determined number of shards to write. |
|
batch_size
|
int64
|
n/a |
|
connection_properties
|
str
|
Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;". |
|
location
|
str
|
Name of the table to write to. |
|
password
|
str
|
Password for the JDBC source. |
|
username
|
str
|
Username for the JDBC source. |
|
write_statement
|
str
|
SQL query used to insert records into the JDBC sink. |
MYSQL
Read
| Configuration | Type | Description |
|---|---|---|
|
jdbc_url
|
str
|
Connection URL for the JDBC source. |
|
connection_init_sql
|
list[ str
]
|
Sets the connection init sql statements used by the Driver. Only MySQL and MariaDB support this. |
|
connection_properties
|
str
|
Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;". |
|
disable_auto_commit
|
boolean
|
Whether to disable auto commit on read. Defaults to true if not provided. The need for this config varies depending on the database platform. Informix requires this to be set to false while Postgres requires this to be set to true. |
|
fetch_size
|
int32
|
This method is used to override the size of the data that is going to be fetched and loaded in memory per every database call. It should ONLY be used if the default value throws memory errors. |
|
location
|
str
|
Name of the table to read from. |
|
num_partitions
|
int32
|
The number of partitions |
|
output_parallelization
|
boolean
|
Whether to reshuffle the resulting PCollection so results are distributed to all workers. |
|
partition_column
|
str
|
Name of a column of numeric type that will be used for partitioning. |
|
password
|
str
|
Password for the JDBC source. |
|
read_query
|
str
|
SQL query used to query the JDBC source. |
|
username
|
str
|
Username for the JDBC source. |
MYSQL
Write
| Configuration | Type | Description |
|---|---|---|
|
jdbc_url
|
str
|
Connection URL for the JDBC sink. |
|
autosharding
|
boolean
|
If true, enables using a dynamically determined number of shards to write. |
|
batch_size
|
int64
|
n/a |
|
connection_init_sql
|
list[ str
]
|
Sets the connection init sql statements used by the Driver. Only MySQL and MariaDB support this. |
|
connection_properties
|
str
|
Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;". |
|
location
|
str
|
Name of the table to write to. |
|
password
|
str
|
Password for the JDBC source. |
|
username
|
str
|
Username for the JDBC source. |
|
write_statement
|
str
|
SQL query used to insert records into the JDBC sink. |
SQLSERVER
Read
| Configuration | Type | Description |
|---|---|---|
|
jdbc_url
|
str
|
Connection URL for the JDBC source. |
|
connection_properties
|
str
|
Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;". |
|
disable_auto_commit
|
boolean
|
Whether to disable auto commit on read. Defaults to true if not provided. The need for this config varies depending on the database platform. Informix requires this to be set to false while Postgres requires this to be set to true. |
|
fetch_size
|
int32
|
This method is used to override the size of the data that is going to be fetched and loaded in memory per every database call. It should ONLY be used if the default value throws memory errors. |
|
location
|
str
|
Name of the table to read from. |
|
num_partitions
|
int32
|
The number of partitions |
|
output_parallelization
|
boolean
|
Whether to reshuffle the resulting PCollection so results are distributed to all workers. |
|
partition_column
|
str
|
Name of a column of numeric type that will be used for partitioning. |
|
password
|
str
|
Password for the JDBC source. |
|
read_query
|
str
|
SQL query used to query the JDBC source. |
|
username
|
str
|
Username for the JDBC source. |
SQLSERVER
Write
| Configuration | Type | Description |
|---|---|---|
|
jdbc_url
|
str
|
Connection URL for the JDBC sink. |
|
autosharding
|
boolean
|
If true, enables using a dynamically determined number of shards to write. |
|
batch_size
|
int64
|
n/a |
|
connection_properties
|
str
|
Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;". |
|
location
|
str
|
Name of the table to write to. |
|
password
|
str
|
Password for the JDBC source. |
|
username
|
str
|
Username for the JDBC source. |
|
write_statement
|
str
|
SQL query used to insert records into the JDBC sink. |
What's next
For more information and code examples, see the following documents:

