JDBC to JDBC template

Use the Serverless for Apache Spark JDBC to JDBC template to extract data from JDBC to JDBC.

This template supports the following databases:

MySQL
PostgreSQL
Microsoft SQL Server
Oracle

Use the template

Run the template using the gcloud CLI or Dataproc API.

gcloud

Before using any of the command data below, make the following replacements:

PROJECT_ID : Required. Your Google Cloud project ID listed in the IAM Settings .
REGION : Required. Compute Engine region .
SUBNET : Optional. If a subnet is not specified, the subnet in the specified REGION in the default network is selected.
Example: projects/ PROJECT_ID /regions/ REGION /subnetworks/ SUBNET_NAME
TEMPLATE_VERSION : Required. Specify latest for the latest template version, or the date of a specific version, for example, 2023-03-17_v0.1.0-beta (visit gs://dataproc-templates-binaries or run gcloud storage ls gs://dataproc-templates-binaries to list available template versions).
INPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH and OUTPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH : Required. The full Cloud Storage path, including the filename, where the input and output JDBC connector jars are stored.
Note:If input and output jars are the same, setting only INPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH will suffice.

You can use the following commands to download JDBC connectors for uploading to Cloud Storage:
- MySQL:
```
 wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.30.tar.gz 
```
- PostgreSQL:
```
 wget https://jdbc.postgresql.org/download/postgresql-42.2.6.jar 
```
- Microsoft SQL Server:
```
 wget https://repo1.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/6.4.0.jre8/mssql-jdbc-6.4.0.jre8.jar 
```
- Oracle:
```
 wget https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/21.7.0.0/ojdbc8-21.7.0.0.jar 
```

The following variables are used to construct the requiredinput JDBC URL:

INPUT_JDBC_HOST
INPUT_JDBC_PORT
INPUT_JDBC_DATABASE , or, for Oracle, INPUT_JDBC_SERVICE
INPUT_JDBC_USERNAME
INPUT_JDBC_PASSWORD

Construct the INPUT_JDBC_CONNECTION_URL using one of the following connector-specific formats:

MySQL:

 jdbc:mysql:// INPUT_JDBC_HOST 
: INPUT_JDBC_PORT 
/ INPUT_JDBC_DATABASE 
?user= INPUT_JDBC_USERNAME 
&password= INPUT_JDBC_PASSWORD

PostgreSQL:

 jdbc:postgresql:// INPUT_JDBC_HOST 
: INPUT_JDBC_PORT 
/ INPUT_JDBC_DATABASE 
?user= INPUT_JDBC_USERNAME 
&password= INPUT_JDBC_PASSWORD

Microsoft SQL Server:

 jdbc:sqlserver:// INPUT_JDBC_HOST 
: INPUT_JDBC_PORT 
;databaseName= INPUT_JDBC_DATABASE 
;user= INPUT_JDBC_USERNAME 
;password= INPUT_JDBC_PASSWORD

Oracle:

 jdbc:oracle:thin:@// INPUT_JDBC_HOST 
: INPUT_JDBC_PORT 
/ INPUT_JDBC_SERVICE 
?user= INPUT_JDBC_USERNAME 
&password= INPUT_JDBC_PASSWORD

The following variables are used to construct the requiredoutput JDBC URL:

OUTPUT_JDBC_HOST
OUTPUT_JDBC_PORT
OUTPUT_JDBC_DATABASE , or, for Oracle, OUTPUT_JDBC_SERVICE
OUTPUT_JDBC_USERNAME
OUTPUT_JDBC_PASSWORD

Construct the OUTPUT_JDBC_CONNECTION_URL using one of the following connector-specific formats:

MySQL:

 jdbc:mysql:// OUTPUT_JDBC_HOST 
: OUTPUT_JDBC_PORT 
/ OUTPUT_JDBC_DATABASE 
?user= OUTPUT_JDBC_USERNAME 
&password= OUTPUT_JDBC_PASSWORD

PostgreSQL:

 jdbc:postgresql:// OUTPUT_JDBC_HOST 
: OUTPUT_JDBC_PORT 
/ OUTPUT_JDBC_DATABASE 
?user= OUTPUT_JDBC_USERNAME 
&password= OUTPUT_JDBC_PASSWORD

Microsoft SQL Server:

 jdbc:sqlserver:// OUTPUT_JDBC_HOST 
: OUTPUT_JDBC_PORT 
;databaseName= OUTPUT_JDBC_DATABASE 
;user= OUTPUT_JDBC_USERNAME 
;password= OUTPUT_JDBC_PASSWORD

Oracle:

 jdbc:oracle:thin:@// OUTPUT_JDBC_HOST 
: OUTPUT_JDBC_PORT 
/ OUTPUT_JDBC_SERVICE 
?user= OUTPUT_JDBC_USERNAME 
&password= OUTPUT_JDBC_PASSWORD

INPUT_JDBC_TABLE : Required. Input JDBC table name or SQL query on the JDBC input table.
Example (the SQL query should be within parentheses): (select * from TABLE_NAME ) as ALIAS_TABLE_NAME
OUTPUT_JDBC_TABLE : Required. JDBC table where output will be stored.
INPUT_DRIVER and OUTPUT_DRIVER : Required. The JDBC input and output driver that is used for the connection:
- MySQL:
```
 com.mysql.cj.jdbc.Driver 
```
- PostgreSQL:
```
 org.postgresql.Driver 
```
- Microsoft SQL Server:
```
 com.microsoft.sqlserver.jdbc.SQLServerDriver 
```
- Oracle:
```
 oracle.jdbc.driver.OracleDriver 
```
INPUT_PARTITION_COLUMN , LOWERBOUND , UPPERBOUND , NUM_PARTITIONS : Optional. If used, all of the following parameters must be specified:
- INPUT_PARTITION_COLUMN: JDBC input table partition column name.
- LOWERBOUND: JDBC input table partition column lower bound used to determine the partition stride.
- UPPERBOUND: JDBC input table partition column upper bound used to determine the partition stride.
- NUM_PARTITIONS: The maximum number of partitions that can be used for parallelism of table reads and writes. If specified, this value is used for the JDBC input and output connection.
FETCHSIZE : Optional. How many rows to fetch per round trip.
BATCH_SIZE : Optional. Number of records to insert per round trip. Default: 1000 .
MODE : Optional. Write mode for JDBC output. Options: Append , Overwrite , Ignore , or ErrorIfExists .
TABLE_PROPERTIES : Optional. This option allows setting of database-specific table and partition options when creating the output table.
PRIMARY_KEY : Optional. Primary key column for output table. Column mentioned should not contain duplicate values, otherwise an error will be thrown.
JDBC_SESSION_INIT : Optional. Session initialization statement to read Java templates.
LOG_LEVEL : Optional. Level of logging. Can be one of ALL , DEBUG , ERROR , FATAL , INFO , OFF , TRACE , or WARN . Default: INFO .
TEMP_VIEW and TEMP_QUERY : Optional. You can use these two optional parameters to apply a Spark SQL transformation while loading data into Cloud Storage. TEMP_VIEW must be the same as table name used in query, and TEMP_QUERY is the query statement.
SERVICE_ACCOUNT : Optional. If not provided, the default Compute Engine service account is used.
PROPERTY and PROPERTY_VALUE : Optional. Comma-separated list of Spark property = value pairs.
LABEL and LABEL_VALUE : Optional. Comma-separated list of label = value pairs.
KMS_KEY : Optional. The Cloud Key Management Service key to use for encryption. If a key is not specified, data is encrypted at rest using a Google-owned and Google-managed encryption key.
Example: projects/ PROJECT_ID /regions/ REGION /keyRings/ KEY_RING_NAME /cryptoKeys/ KEY_NAME

Execute the following command:

Linux, macOS, or Cloud Shell

gcloud  
dataproc  
batches  
submit  
spark  
 \ 
  
--class = 
com.google.cloud.dataproc.templates.main.DataProcTemplate  
 \ 
  
--project = 
 " PROJECT_ID 
" 
  
 \ 
  
--region = 
 " REGION 
" 
  
 \ 
  
--version = 
 "1.2" 
  
 \ 
  
--jars = 
 "gs://dataproc-templates-binaries/ TEMPLATE_VERSION 
/java/dataproc-templates.jar, INPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH 
, OUTPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH 
" 
  
 \ 
  
--subnet = 
 " SUBNET 
" 
  
 \ 
  
--kms-key = 
 " KMS_KEY 
" 
  
 \ 
  
--service-account = 
 " SERVICE_ACCOUNT 
" 
  
 \ 
  
--properties = 
 " PROPERTY 
= PROPERTY_VALUE 
" 
  
 \ 
  
--labels = 
 " LABEL 
= LABEL_VALUE 
" 
  
 \ 
  
--  
--template  
JDBCTOJDBC  
 \ 
  
--templateProperty  
project.id = 
 " PROJECT_ID 
" 
  
 \ 
  
--templateProperty  
log.level = 
 " LOG_LEVEL 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.input.url = 
 " INPUT_JDBC_CONNECTION_URL 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.input.driver = 
 " INPUT_DRIVER 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.input.table = 
 " INPUT_JDBC_TABLE 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.output.url = 
 " OUTPUT_JDBC_CONNECTION_URL 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.output.driver = 
 " OUTPUT_DRIVER 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.output.table = 
 " OUTPUT_JDBC_TABLE 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.input.fetchsize = 
 " FETCHSIZE 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.input.partitioncolumn = 
 " INPUT_PARTITION_COLUMN 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.input.lowerbound = 
 " LOWERBOUND 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.input.upperbound = 
 " UPPERBOUND 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.numpartitions = 
 " NUM_PARTITIONS 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.output.mode = 
 " MODE 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.output.batch.size = 
 " BATCH_SIZE 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.output.primary.key = 
 " PRIMARY_KEY 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.output.create.table.option = 
 " TABLE_PROPERTIES 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.sessioninitstatement = 
 " JDBC_SESSION_INIT 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.temp.view.name = 
 " TEMP_VIEW 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.sql.query = 
 " TEMP_QUERY 
"

Windows (PowerShell)

gcloud  
dataproc  
batches  
submit  
spark  
 ` 
  
--class = 
com.google.cloud.dataproc.templates.main.DataProcTemplate  
 ` 
  
--project = 
 " PROJECT_ID 
" 
  
 ` 
  
--region = 
 " REGION 
" 
  
 ` 
  
--version = 
 "1.2" 
  
 ` 
  
--jars = 
 "gs://dataproc-templates-binaries/ TEMPLATE_VERSION 
/java/dataproc-templates.jar, INPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH 
, OUTPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH 
" 
  
 ` 
  
--subnet = 
 " SUBNET 
" 
  
 ` 
  
--kms-key = 
 " KMS_KEY 
" 
  
 ` 
  
--service-account = 
 " SERVICE_ACCOUNT 
" 
  
 ` 
  
--properties = 
 " PROPERTY 
= PROPERTY_VALUE 
" 
  
 ` 
  
--labels = 
 " LABEL 
= LABEL_VALUE 
" 
  
 ` 
  
--  
--template  
JDBCTOJDBC  
 ` 
  
--templateProperty  
project.id = 
 " PROJECT_ID 
" 
  
 ` 
  
--templateProperty  
log.level = 
 " LOG_LEVEL 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.input.url = 
 " INPUT_JDBC_CONNECTION_URL 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.input.driver = 
 " INPUT_DRIVER 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.input.table = 
 " INPUT_JDBC_TABLE 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.output.url = 
 " OUTPUT_JDBC_CONNECTION_URL 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.output.driver = 
 " OUTPUT_DRIVER 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.output.table = 
 " OUTPUT_JDBC_TABLE 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.input.fetchsize = 
 " FETCHSIZE 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.input.partitioncolumn = 
 " INPUT_PARTITION_COLUMN 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.input.lowerbound = 
 " LOWERBOUND 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.input.upperbound = 
 " UPPERBOUND 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.numpartitions = 
 " NUM_PARTITIONS 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.output.mode = 
 " MODE 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.output.batch.size = 
 " BATCH_SIZE 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.output.primary.key = 
 " PRIMARY_KEY 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.output.create.table.option = 
 " TABLE_PROPERTIES 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.sessioninitstatement = 
 " JDBC_SESSION_INIT 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.temp.view.name = 
 " TEMP_VIEW 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.sql.query = 
 " TEMP_QUERY 
"

Windows (cmd.exe)

gcloud  
dataproc  
batches  
submit  
spark  
^  
--class = 
com.google.cloud.dataproc.templates.main.DataProcTemplate  
^  
--project = 
 " PROJECT_ID 
" 
  
^  
--region = 
 " REGION 
" 
  
^  
--version = 
 "1.2" 
  
^  
--jars = 
 "gs://dataproc-templates-binaries/ TEMPLATE_VERSION 
/java/dataproc-templates.jar, INPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH 
, OUTPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH 
" 
  
^  
--subnet = 
 " SUBNET 
" 
  
^  
--kms-key = 
 " KMS_KEY 
" 
  
^  
--service-account = 
 " SERVICE_ACCOUNT 
" 
  
^  
--properties = 
 " PROPERTY 
= PROPERTY_VALUE 
" 
  
^  
--labels = 
 " LABEL 
= LABEL_VALUE 
" 
  
^  
--  
--template  
JDBCTOJDBC  
^  
--templateProperty  
project.id = 
 " PROJECT_ID 
" 
  
^  
--templateProperty  
log.level = 
 " LOG_LEVEL 
" 
  
^  
--templateProperty  
jdbctojdbc.input.url = 
 " INPUT_JDBC_CONNECTION_URL 
" 
  
^  
--templateProperty  
jdbctojdbc.input.driver = 
 " INPUT_DRIVER 
" 
  
^  
--templateProperty  
jdbctojdbc.input.table = 
 " INPUT_JDBC_TABLE 
" 
  
^  
--templateProperty  
jdbctojdbc.output.url = 
 " OUTPUT_JDBC_CONNECTION_URL 
" 
  
^  
--templateProperty  
jdbctojdbc.output.driver = 
 " OUTPUT_DRIVER 
" 
  
^  
--templateProperty  
jdbctojdbc.output.table = 
 " OUTPUT_JDBC_TABLE 
" 
  
^  
--templateProperty  
jdbctojdbc.input.fetchsize = 
 " FETCHSIZE 
" 
  
^  
--templateProperty  
jdbctojdbc.input.partitioncolumn = 
 " INPUT_PARTITION_COLUMN 
" 
  
^  
--templateProperty  
jdbctojdbc.input.lowerbound = 
 " LOWERBOUND 
" 
  
^  
--templateProperty  
jdbctojdbc.input.upperbound = 
 " UPPERBOUND 
" 
  
^  
--templateProperty  
jdbctojdbc.numpartitions = 
 " NUM_PARTITIONS 
" 
  
^  
--templateProperty  
jdbctojdbc.output.mode = 
 " MODE 
" 
  
^  
--templateProperty  
jdbctojdbc.output.batch.size = 
 " BATCH_SIZE 
" 
  
^  
--templateProperty  
jdbctojdbc.output.primary.key = 
 " PRIMARY_KEY 
" 
  
^  
--templateProperty  
jdbctojdbc.output.create.table.option = 
 " TABLE_PROPERTIES 
" 
  
^  
--templateProperty  
jdbctojdbc.sessioninitstatement = 
 " JDBC_SESSION_INIT 
" 
  
^  
--templateProperty  
jdbctojdbc.temp.view.name = 
 " TEMP_VIEW 
" 
  
^  
--templateProperty  
jdbctojdbc.sql.query = 
 " TEMP_QUERY 
"

REST

Before using any of the request data, make the following replacements:

PROJECT_ID : Required. Your Google Cloud project ID listed in the IAM Settings .
REGION : Required. Compute Engine region .
SUBNET : Optional. If a subnet is not specified, the subnet in the specified REGION in the default network is selected.
Example: projects/ PROJECT_ID /regions/ REGION /subnetworks/ SUBNET_NAME
TEMPLATE_VERSION : Required. Specify latest for the latest template version, or the date of a specific version, for example, 2023-03-17_v0.1.0-beta (visit gs://dataproc-templates-binaries or run gcloud storage ls gs://dataproc-templates-binaries to list available template versions).
INPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH and OUTPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH : Required. The full Cloud Storage path, including the filename, where the input and output JDBC connector jars are stored.
Note:If input and output jars are the same, setting only INPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH will suffice.

You can use the following commands to download JDBC connectors for uploading to Cloud Storage:
- MySQL:
```
 wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.30.tar.gz 
```
- PostgreSQL:
```
 wget https://jdbc.postgresql.org/download/postgresql-42.2.6.jar 
```
- Microsoft SQL Server:
```
 wget https://repo1.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/6.4.0.jre8/mssql-jdbc-6.4.0.jre8.jar 
```
- Oracle:
```
 wget https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/21.7.0.0/ojdbc8-21.7.0.0.jar 
```

The following variables are used to construct the requiredinput JDBC URL:

INPUT_JDBC_HOST
INPUT_JDBC_PORT
INPUT_JDBC_DATABASE , or, for Oracle, INPUT_JDBC_SERVICE
INPUT_JDBC_USERNAME
INPUT_JDBC_PASSWORD

Construct the INPUT_JDBC_CONNECTION_URL using one of the following connector-specific formats:

MySQL:

 jdbc:mysql:// INPUT_JDBC_HOST 
: INPUT_JDBC_PORT 
/ INPUT_JDBC_DATABASE 
?user= INPUT_JDBC_USERNAME 
&password= INPUT_JDBC_PASSWORD

PostgreSQL:

 jdbc:postgresql:// INPUT_JDBC_HOST 
: INPUT_JDBC_PORT 
/ INPUT_JDBC_DATABASE 
?user= INPUT_JDBC_USERNAME 
&password= INPUT_JDBC_PASSWORD

Microsoft SQL Server:

 jdbc:sqlserver:// INPUT_JDBC_HOST 
: INPUT_JDBC_PORT 
;databaseName= INPUT_JDBC_DATABASE 
;user= INPUT_JDBC_USERNAME 
;password= INPUT_JDBC_PASSWORD

Oracle:

 jdbc:oracle:thin:@// INPUT_JDBC_HOST 
: INPUT_JDBC_PORT 
/ INPUT_JDBC_SERVICE 
?user= INPUT_JDBC_USERNAME 
&password= INPUT_JDBC_PASSWORD

The following variables are used to construct the requiredoutput JDBC URL:

OUTPUT_JDBC_HOST
OUTPUT_JDBC_PORT
OUTPUT_JDBC_DATABASE , or, for Oracle, OUTPUT_JDBC_SERVICE
OUTPUT_JDBC_USERNAME
OUTPUT_JDBC_PASSWORD

Construct the OUTPUT_JDBC_CONNECTION_URL using one of the following connector-specific formats:

MySQL:

 jdbc:mysql:// OUTPUT_JDBC_HOST 
: OUTPUT_JDBC_PORT 
/ OUTPUT_JDBC_DATABASE 
?user= OUTPUT_JDBC_USERNAME 
&password= OUTPUT_JDBC_PASSWORD

PostgreSQL:

 jdbc:postgresql:// OUTPUT_JDBC_HOST 
: OUTPUT_JDBC_PORT 
/ OUTPUT_JDBC_DATABASE 
?user= OUTPUT_JDBC_USERNAME 
&password= OUTPUT_JDBC_PASSWORD

Microsoft SQL Server:

 jdbc:sqlserver:// OUTPUT_JDBC_HOST 
: OUTPUT_JDBC_PORT 
;databaseName= OUTPUT_JDBC_DATABASE 
;user= OUTPUT_JDBC_USERNAME 
;password= OUTPUT_JDBC_PASSWORD

Oracle:

 jdbc:oracle:thin:@// OUTPUT_JDBC_HOST 
: OUTPUT_JDBC_PORT 
/ OUTPUT_JDBC_SERVICE 
?user= OUTPUT_JDBC_USERNAME 
&password= OUTPUT_JDBC_PASSWORD

INPUT_JDBC_TABLE : Required. Input JDBC table name or SQL query on the JDBC input table.
Example (the SQL query should be within parentheses): (select * from TABLE_NAME ) as ALIAS_TABLE_NAME
OUTPUT_JDBC_TABLE : Required. JDBC table where output will be stored.
INPUT_DRIVER and OUTPUT_DRIVER : Required. The JDBC input and output driver that is used for the connection:
- MySQL:
```
 com.mysql.cj.jdbc.Driver 
```
- PostgreSQL:
```
 org.postgresql.Driver 
```
- Microsoft SQL Server:
```
 com.microsoft.sqlserver.jdbc.SQLServerDriver 
```
- Oracle:
```
 oracle.jdbc.driver.OracleDriver 
```
INPUT_PARTITION_COLUMN , LOWERBOUND , UPPERBOUND , NUM_PARTITIONS : Optional. If used, all of the following parameters must be specified:
- INPUT_PARTITION_COLUMN: JDBC input table partition column name.
- LOWERBOUND: JDBC input table partition column lower bound used to determine the partition stride.
- UPPERBOUND: JDBC input table partition column upper bound used to determine the partition stride.
- NUM_PARTITIONS: The maximum number of partitions that can be used for parallelism of table reads and writes. If specified, this value is used for the JDBC input and output connection.
FETCHSIZE : Optional. How many rows to fetch per round trip.
BATCH_SIZE : Optional. Number of records to insert per round trip. Default: 1000 .
MODE : Optional. Write mode for JDBC output. Options: Append , Overwrite , Ignore , or ErrorIfExists .
TABLE_PROPERTIES : Optional. This option allows setting of database-specific table and partition options when creating the output table.
PRIMARY_KEY : Optional. Primary key column for output table. Column mentioned should not contain duplicate values, otherwise an error will be thrown.
JDBC_SESSION_INIT : Optional. Session initialization statement to read Java templates.
LOG_LEVEL : Optional. Level of logging. Can be one of ALL , DEBUG , ERROR , FATAL , INFO , OFF , TRACE , or WARN . Default: INFO .
TEMP_VIEW and TEMP_QUERY : Optional. You can use these two optional parameters to apply a Spark SQL transformation while loading data into Cloud Storage. TEMP_VIEW must be the same as table name used in query, and TEMP_QUERY is the query statement.
SERVICE_ACCOUNT : Optional. If not provided, the default Compute Engine service account is used.
PROPERTY and PROPERTY_VALUE : Optional. Comma-separated list of Spark property = value pairs.
LABEL and LABEL_VALUE : Optional. Comma-separated list of label = value pairs.
KMS_KEY : Optional. The Cloud Key Management Service key to use for encryption. If a key is not specified, data is encrypted at rest using a Google-owned and Google-managed encryption key.
Example: projects/ PROJECT_ID /regions/ REGION /keyRings/ KEY_RING_NAME /cryptoKeys/ KEY_NAME

HTTP method and URL:

POST https://dataproc.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ REGION 
/batches

Request JSON body:

{
  "environmentConfig": {
    "executionConfig": {
      "subnetworkUri": " SUBNET 
",
      "kmsKey": " KMS_KEY 
",
      "serviceAccount": " SERVICE_ACCOUNT 
"
    }
  },
  "labels": {
    " LABEL 
": " LABEL_VALUE 
"
  },
  "runtimeConfig": {
    "version": "1.2",
    "properties": {
      " PROPERTY 
": " PROPERTY_VALUE 
"
    }
  },
  "sparkBatch": {
    "mainClass": "com.google.cloud.dataproc.templates.main.DataProcTemplate",
    "args": [
      "--template","JDBCTOJDBC",
      "--templateProperty","log.level= LOG_LEVEL 
",
      "--templateProperty","project.id= PROJECT_ID 
",
      "--templateProperty","jdbctojdbc.input.url= INPUT_JDBC_CONNECTION_URL 
",
      "--templateProperty","jdbctojdbc.input.driver= INPUT_DRIVER 
",
      "--templateProperty","jdbctojdbc.input.table= INPUT_TABLE 
",
      "--templateProperty","jdbctojdbc.output.url= OUTPUT_JDBC_CONNECTION_URL 
",
      "--templateProperty","jdbctojdbc.output.driver= OUTPUT_DRIVER 
",
      "--templateProperty","jdbctojdbc.output.table= OUTPUT_TABLE 
",
      "--templateProperty","jdbctojdbc.input.fetchsize= FETCHSIZE 
",
      "--templateProperty","jdbctojdbc.input.partitioncolumn= INPUT_PARTITION_COLUMN 
",
      "--templateProperty","jdbctojdbc.input.lowerbound= LOWERBOUND 
",
      "--templateProperty","jdbctojdbc.input.upperbound= UPPERBOUND 
",
      "--templateProperty","jdbctojdbc.numpartitions= NUM_PARTITIONS 
",
      "--templateProperty","jdbctojdbc.output.mode= MODE 
",
      "--templateProperty","jdbctojdbc.output.batch.size= BATCH_SIZE 
",
      "--templateProperty","jdbctojdbc.output.primary.key= PRIMARY_KEY 
",
      "--templateProperty","jdbctojdbc.output.create.table.option= TABLE_PROPERTIES 
",
      "--templateProperty","jdbctojdbc.sessioninitstatement= JDBC_SESSION_INIT 
",
      "--templateProperty","jdbctojdbc.temp.view.name= TEMP_VIEW 
",
      "--templateProperty","jdbctojdbc.sql.query= TEMP_QUERY 
"
    ],
    "jarFileUris": [
      "gs://dataproc-templates-binaries/ TEMPLATE_VERSION 
/java/dataproc-templates.jar",
      " INPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH 
",
      " OUTPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH 
"
    ]
  }
}

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell , which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list .

Save the request body in a file named request.json , and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://dataproc.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ REGION 
/batches"

PowerShell (Windows)

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list .

Save the request body in a file named request.json , and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://dataproc.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ REGION 
/batches" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{
  "name": "projects/ PROJECT_ID 
/regions/ REGION 
/operations/ OPERATION_ID 
",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.dataproc.v1.BatchOperationMetadata",
    "batch": "projects/ PROJECT_ID 
/locations/ REGION 
/batches/ BATCH_ID 
",
    "batchUuid": "de8af8d4-3599-4a7c-915c-798201ed1583",
    "createTime": "2023-02-24T03:31:03.440329Z",
    "operationType": "BATCH",
    "description": "Batch"
  }
}