JDBC to JDBC template

Use the Serverless for Apache Spark JDBC to JDBC template to extract data from JDBC to JDBC.

This template supports the following databases:

  • MySQL
  • PostgreSQL
  • Microsoft SQL Server
  • Oracle

Use the template

Run the template using the gcloud CLI or Dataproc API.

gcloud

Before using any of the command data below, make the following replacements:

  • PROJECT_ID : Required. Your Google Cloud project ID listed in the IAM Settings .
  • REGION : Required. Compute Engine region .
  • SUBNET : Optional. If a subnet is not specified, the subnet in the specified REGION in the default network is selected.

    Example: projects/ PROJECT_ID /regions/ REGION /subnetworks/ SUBNET_NAME

  • TEMPLATE_VERSION : Required. Specify latest for the latest template version, or the date of a specific version, for example, 2023-03-17_v0.1.0-beta (visit gs://dataproc-templates-binaries or run gcloud storage ls gs://dataproc-templates-binaries to list available template versions).
  • INPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH and OUTPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH : Required. The full Cloud Storage path, including the filename, where the input and output JDBC connector jars are stored.

    Note:If input and output jars are the same, setting only INPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH will suffice.

    You can use the following commands to download JDBC connectors for uploading to Cloud Storage:

    • MySQL:
       wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.30.tar.gz 
      
    • PostgreSQL:
       wget https://jdbc.postgresql.org/download/postgresql-42.2.6.jar 
      
    • Microsoft SQL Server:
       wget https://repo1.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/6.4.0.jre8/mssql-jdbc-6.4.0.jre8.jar 
      
    • Oracle:
       wget https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/21.7.0.0/ojdbc8-21.7.0.0.jar 
      
  • The following variables are used to construct the requiredinput JDBC URL:
    • INPUT_JDBC_HOST
    • INPUT_JDBC_PORT
    • INPUT_JDBC_DATABASE , or, for Oracle, INPUT_JDBC_SERVICE
    • INPUT_JDBC_USERNAME
    • INPUT_JDBC_PASSWORD

    Construct the INPUT_JDBC_CONNECTION_URL using one of the following connector-specific formats:

    • MySQL:
       jdbc:mysql:// INPUT_JDBC_HOST 
      : INPUT_JDBC_PORT 
      / INPUT_JDBC_DATABASE 
      ?user= INPUT_JDBC_USERNAME 
      &password= INPUT_JDBC_PASSWORD 
       
      
    • PostgreSQL:
       jdbc:postgresql:// INPUT_JDBC_HOST 
      : INPUT_JDBC_PORT 
      / INPUT_JDBC_DATABASE 
      ?user= INPUT_JDBC_USERNAME 
      &password= INPUT_JDBC_PASSWORD 
       
      
    • Microsoft SQL Server:
       jdbc:sqlserver:// INPUT_JDBC_HOST 
      : INPUT_JDBC_PORT 
      ;databaseName= INPUT_JDBC_DATABASE 
      ;user= INPUT_JDBC_USERNAME 
      ;password= INPUT_JDBC_PASSWORD 
       
      
    • Oracle:
       jdbc:oracle:thin:@// INPUT_JDBC_HOST 
      : INPUT_JDBC_PORT 
      / INPUT_JDBC_SERVICE 
      ?user= INPUT_JDBC_USERNAME 
      &password= INPUT_JDBC_PASSWORD 
       
      
  • The following variables are used to construct the requiredoutput JDBC URL:
    • OUTPUT_JDBC_HOST
    • OUTPUT_JDBC_PORT
    • OUTPUT_JDBC_DATABASE , or, for Oracle, OUTPUT_JDBC_SERVICE
    • OUTPUT_JDBC_USERNAME
    • OUTPUT_JDBC_PASSWORD

    Construct the OUTPUT_JDBC_CONNECTION_URL using one of the following connector-specific formats:

    • MySQL:
       jdbc:mysql:// OUTPUT_JDBC_HOST 
      : OUTPUT_JDBC_PORT 
      / OUTPUT_JDBC_DATABASE 
      ?user= OUTPUT_JDBC_USERNAME 
      &password= OUTPUT_JDBC_PASSWORD 
       
      
    • PostgreSQL:
       jdbc:postgresql:// OUTPUT_JDBC_HOST 
      : OUTPUT_JDBC_PORT 
      / OUTPUT_JDBC_DATABASE 
      ?user= OUTPUT_JDBC_USERNAME 
      &password= OUTPUT_JDBC_PASSWORD 
       
      
    • Microsoft SQL Server:
       jdbc:sqlserver:// OUTPUT_JDBC_HOST 
      : OUTPUT_JDBC_PORT 
      ;databaseName= OUTPUT_JDBC_DATABASE 
      ;user= OUTPUT_JDBC_USERNAME 
      ;password= OUTPUT_JDBC_PASSWORD 
       
      
    • Oracle:
       jdbc:oracle:thin:@// OUTPUT_JDBC_HOST 
      : OUTPUT_JDBC_PORT 
      / OUTPUT_JDBC_SERVICE 
      ?user= OUTPUT_JDBC_USERNAME 
      &password= OUTPUT_JDBC_PASSWORD 
       
      
  • INPUT_JDBC_TABLE : Required. Input JDBC table name or SQL query on the JDBC input table.

    Example (the SQL query should be within parentheses): (select * from TABLE_NAME ) as ALIAS_TABLE_NAME

  • OUTPUT_JDBC_TABLE : Required. JDBC table where output will be stored.
  • INPUT_DRIVER and OUTPUT_DRIVER : Required. The JDBC input and output driver that is used for the connection:
    • MySQL:
       com.mysql.cj.jdbc.Driver 
      
    • PostgreSQL:
       org.postgresql.Driver 
      
    • Microsoft SQL Server:
       com.microsoft.sqlserver.jdbc.SQLServerDriver 
      
    • Oracle:
       oracle.jdbc.driver.OracleDriver 
      
  • INPUT_PARTITION_COLUMN , LOWERBOUND , UPPERBOUND , NUM_PARTITIONS : Optional. If used, all of the following parameters must be specified:
    • INPUT_PARTITION_COLUMN: JDBC input table partition column name.
    • LOWERBOUND: JDBC input table partition column lower bound used to determine the partition stride.
    • UPPERBOUND: JDBC input table partition column upper bound used to determine the partition stride.
    • NUM_PARTITIONS: The maximum number of partitions that can be used for parallelism of table reads and writes. If specified, this value is used for the JDBC input and output connection.
  • FETCHSIZE : Optional. How many rows to fetch per round trip.
  • BATCH_SIZE : Optional. Number of records to insert per round trip. Default: 1000 .
  • MODE : Optional. Write mode for JDBC output. Options: Append , Overwrite , Ignore , or ErrorIfExists .
  • TABLE_PROPERTIES : Optional. This option allows setting of database-specific table and partition options when creating the output table.
  • PRIMARY_KEY : Optional. Primary key column for output table. Column mentioned should not contain duplicate values, otherwise an error will be thrown.
  • JDBC_SESSION_INIT : Optional. Session initialization statement to read Java templates.
  • LOG_LEVEL : Optional. Level of logging. Can be one of ALL , DEBUG , ERROR , FATAL , INFO , OFF , TRACE , or WARN . Default: INFO .
  • TEMP_VIEW and TEMP_QUERY : Optional. You can use these two optional parameters to apply a Spark SQL transformation while loading data into Cloud Storage. TEMP_VIEW must be the same as table name used in query, and TEMP_QUERY is the query statement.
  • SERVICE_ACCOUNT : Optional. If not provided, the default Compute Engine service account is used.
  • PROPERTY and PROPERTY_VALUE : Optional. Comma-separated list of Spark property = value pairs.
  • LABEL and LABEL_VALUE : Optional. Comma-separated list of label = value pairs.
  • KMS_KEY : Optional. The Cloud Key Management Service key to use for encryption. If a key is not specified, data is encrypted at rest using a Google-owned and Google-managed encryption key.

    Example: projects/ PROJECT_ID /regions/ REGION /keyRings/ KEY_RING_NAME /cryptoKeys/ KEY_NAME

Execute the following command:

Linux, macOS, or Cloud Shell

gcloud  
dataproc  
batches  
submit  
spark  
 \ 
  
--class = 
com.google.cloud.dataproc.templates.main.DataProcTemplate  
 \ 
  
--project = 
 " PROJECT_ID 
" 
  
 \ 
  
--region = 
 " REGION 
" 
  
 \ 
  
--version = 
 "1.2" 
  
 \ 
  
--jars = 
 "gs://dataproc-templates-binaries/ TEMPLATE_VERSION 
/java/dataproc-templates.jar, INPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH 
, OUTPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH 
" 
  
 \ 
  
--subnet = 
 " SUBNET 
" 
  
 \ 
  
--kms-key = 
 " KMS_KEY 
" 
  
 \ 
  
--service-account = 
 " SERVICE_ACCOUNT 
" 
  
 \ 
  
--properties = 
 " PROPERTY 
= PROPERTY_VALUE 
" 
  
 \ 
  
--labels = 
 " LABEL 
= LABEL_VALUE 
" 
  
 \ 
  
--  
--template  
JDBCTOJDBC  
 \ 
  
--templateProperty  
project.id = 
 " PROJECT_ID 
" 
  
 \ 
  
--templateProperty  
log.level = 
 " LOG_LEVEL 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.input.url = 
 " INPUT_JDBC_CONNECTION_URL 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.input.driver = 
 " INPUT_DRIVER 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.input.table = 
 " INPUT_JDBC_TABLE 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.output.url = 
 " OUTPUT_JDBC_CONNECTION_URL 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.output.driver = 
 " OUTPUT_DRIVER 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.output.table = 
 " OUTPUT_JDBC_TABLE 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.input.fetchsize = 
 " FETCHSIZE 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.input.partitioncolumn = 
 " INPUT_PARTITION_COLUMN 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.input.lowerbound = 
 " LOWERBOUND 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.input.upperbound = 
 " UPPERBOUND 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.numpartitions = 
 " NUM_PARTITIONS 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.output.mode = 
 " MODE 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.output.batch.size = 
 " BATCH_SIZE 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.output.primary.key = 
 " PRIMARY_KEY 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.output.create.table.option = 
 " TABLE_PROPERTIES 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.sessioninitstatement = 
 " JDBC_SESSION_INIT 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.temp.view.name = 
 " TEMP_VIEW 
" 
  
 \ 
  
--templateProperty  
jdbctojdbc.sql.query = 
 " TEMP_QUERY 
" 

Windows (PowerShell)

gcloud  
dataproc  
batches  
submit  
spark  
 ` 
  
--class = 
com.google.cloud.dataproc.templates.main.DataProcTemplate  
 ` 
  
--project = 
 " PROJECT_ID 
" 
  
 ` 
  
--region = 
 " REGION 
" 
  
 ` 
  
--version = 
 "1.2" 
  
 ` 
  
--jars = 
 "gs://dataproc-templates-binaries/ TEMPLATE_VERSION 
/java/dataproc-templates.jar, INPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH 
, OUTPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH 
" 
  
 ` 
  
--subnet = 
 " SUBNET 
" 
  
 ` 
  
--kms-key = 
 " KMS_KEY 
" 
  
 ` 
  
--service-account = 
 " SERVICE_ACCOUNT 
" 
  
 ` 
  
--properties = 
 " PROPERTY 
= PROPERTY_VALUE 
" 
  
 ` 
  
--labels = 
 " LABEL 
= LABEL_VALUE 
" 
  
 ` 
  
--  
--template  
JDBCTOJDBC  
 ` 
  
--templateProperty  
project.id = 
 " PROJECT_ID 
" 
  
 ` 
  
--templateProperty  
log.level = 
 " LOG_LEVEL 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.input.url = 
 " INPUT_JDBC_CONNECTION_URL 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.input.driver = 
 " INPUT_DRIVER 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.input.table = 
 " INPUT_JDBC_TABLE 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.output.url = 
 " OUTPUT_JDBC_CONNECTION_URL 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.output.driver = 
 " OUTPUT_DRIVER 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.output.table = 
 " OUTPUT_JDBC_TABLE 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.input.fetchsize = 
 " FETCHSIZE 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.input.partitioncolumn = 
 " INPUT_PARTITION_COLUMN 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.input.lowerbound = 
 " LOWERBOUND 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.input.upperbound = 
 " UPPERBOUND 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.numpartitions = 
 " NUM_PARTITIONS 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.output.mode = 
 " MODE 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.output.batch.size = 
 " BATCH_SIZE 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.output.primary.key = 
 " PRIMARY_KEY 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.output.create.table.option = 
 " TABLE_PROPERTIES 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.sessioninitstatement = 
 " JDBC_SESSION_INIT 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.temp.view.name = 
 " TEMP_VIEW 
" 
  
 ` 
  
--templateProperty  
jdbctojdbc.sql.query = 
 " TEMP_QUERY 
" 

Windows (cmd.exe)

gcloud  
dataproc  
batches  
submit  
spark  
^  
--class = 
com.google.cloud.dataproc.templates.main.DataProcTemplate  
^  
--project = 
 " PROJECT_ID 
" 
  
^  
--region = 
 " REGION 
" 
  
^  
--version = 
 "1.2" 
  
^  
--jars = 
 "gs://dataproc-templates-binaries/ TEMPLATE_VERSION 
/java/dataproc-templates.jar, INPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH 
, OUTPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH 
" 
  
^  
--subnet = 
 " SUBNET 
" 
  
^  
--kms-key = 
 " KMS_KEY 
" 
  
^  
--service-account = 
 " SERVICE_ACCOUNT 
" 
  
^  
--properties = 
 " PROPERTY 
= PROPERTY_VALUE 
" 
  
^  
--labels = 
 " LABEL 
= LABEL_VALUE 
" 
  
^  
--  
--template  
JDBCTOJDBC  
^  
--templateProperty  
project.id = 
 " PROJECT_ID 
" 
  
^  
--templateProperty  
log.level = 
 " LOG_LEVEL 
" 
  
^  
--templateProperty  
jdbctojdbc.input.url = 
 " INPUT_JDBC_CONNECTION_URL 
" 
  
^  
--templateProperty  
jdbctojdbc.input.driver = 
 " INPUT_DRIVER 
" 
  
^  
--templateProperty  
jdbctojdbc.input.table = 
 " INPUT_JDBC_TABLE 
" 
  
^  
--templateProperty  
jdbctojdbc.output.url = 
 " OUTPUT_JDBC_CONNECTION_URL 
" 
  
^  
--templateProperty  
jdbctojdbc.output.driver = 
 " OUTPUT_DRIVER 
" 
  
^  
--templateProperty  
jdbctojdbc.output.table = 
 " OUTPUT_JDBC_TABLE 
" 
  
^  
--templateProperty  
jdbctojdbc.input.fetchsize = 
 " FETCHSIZE 
" 
  
^  
--templateProperty  
jdbctojdbc.input.partitioncolumn = 
 " INPUT_PARTITION_COLUMN 
" 
  
^  
--templateProperty  
jdbctojdbc.input.lowerbound = 
 " LOWERBOUND 
" 
  
^  
--templateProperty  
jdbctojdbc.input.upperbound = 
 " UPPERBOUND 
" 
  
^  
--templateProperty  
jdbctojdbc.numpartitions = 
 " NUM_PARTITIONS 
" 
  
^  
--templateProperty  
jdbctojdbc.output.mode = 
 " MODE 
" 
  
^  
--templateProperty  
jdbctojdbc.output.batch.size = 
 " BATCH_SIZE 
" 
  
^  
--templateProperty  
jdbctojdbc.output.primary.key = 
 " PRIMARY_KEY 
" 
  
^  
--templateProperty  
jdbctojdbc.output.create.table.option = 
 " TABLE_PROPERTIES 
" 
  
^  
--templateProperty  
jdbctojdbc.sessioninitstatement = 
 " JDBC_SESSION_INIT 
" 
  
^  
--templateProperty  
jdbctojdbc.temp.view.name = 
 " TEMP_VIEW 
" 
  
^  
--templateProperty  
jdbctojdbc.sql.query = 
 " TEMP_QUERY 
" 

REST

Before using any of the request data, make the following replacements:

  • PROJECT_ID : Required. Your Google Cloud project ID listed in the IAM Settings .
  • REGION : Required. Compute Engine region .
  • SUBNET : Optional. If a subnet is not specified, the subnet in the specified REGION in the default network is selected.

    Example: projects/ PROJECT_ID /regions/ REGION /subnetworks/ SUBNET_NAME

  • TEMPLATE_VERSION : Required. Specify latest for the latest template version, or the date of a specific version, for example, 2023-03-17_v0.1.0-beta (visit gs://dataproc-templates-binaries or run gcloud storage ls gs://dataproc-templates-binaries to list available template versions).
  • INPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH and OUTPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH : Required. The full Cloud Storage path, including the filename, where the input and output JDBC connector jars are stored.

    Note:If input and output jars are the same, setting only INPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH will suffice.

    You can use the following commands to download JDBC connectors for uploading to Cloud Storage:

    • MySQL:
       wget http://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.30.tar.gz 
      
    • PostgreSQL:
       wget https://jdbc.postgresql.org/download/postgresql-42.2.6.jar 
      
    • Microsoft SQL Server:
       wget https://repo1.maven.org/maven2/com/microsoft/sqlserver/mssql-jdbc/6.4.0.jre8/mssql-jdbc-6.4.0.jre8.jar 
      
    • Oracle:
       wget https://repo1.maven.org/maven2/com/oracle/database/jdbc/ojdbc8/21.7.0.0/ojdbc8-21.7.0.0.jar 
      
  • The following variables are used to construct the requiredinput JDBC URL:
    • INPUT_JDBC_HOST
    • INPUT_JDBC_PORT
    • INPUT_JDBC_DATABASE , or, for Oracle, INPUT_JDBC_SERVICE
    • INPUT_JDBC_USERNAME
    • INPUT_JDBC_PASSWORD

    Construct the INPUT_JDBC_CONNECTION_URL using one of the following connector-specific formats:

    • MySQL:
       jdbc:mysql:// INPUT_JDBC_HOST 
      : INPUT_JDBC_PORT 
      / INPUT_JDBC_DATABASE 
      ?user= INPUT_JDBC_USERNAME 
      &password= INPUT_JDBC_PASSWORD 
       
      
    • PostgreSQL:
       jdbc:postgresql:// INPUT_JDBC_HOST 
      : INPUT_JDBC_PORT 
      / INPUT_JDBC_DATABASE 
      ?user= INPUT_JDBC_USERNAME 
      &password= INPUT_JDBC_PASSWORD 
       
      
    • Microsoft SQL Server:
       jdbc:sqlserver:// INPUT_JDBC_HOST 
      : INPUT_JDBC_PORT 
      ;databaseName= INPUT_JDBC_DATABASE 
      ;user= INPUT_JDBC_USERNAME 
      ;password= INPUT_JDBC_PASSWORD 
       
      
    • Oracle:
       jdbc:oracle:thin:@// INPUT_JDBC_HOST 
      : INPUT_JDBC_PORT 
      / INPUT_JDBC_SERVICE 
      ?user= INPUT_JDBC_USERNAME 
      &password= INPUT_JDBC_PASSWORD 
       
      
  • The following variables are used to construct the requiredoutput JDBC URL:
    • OUTPUT_JDBC_HOST
    • OUTPUT_JDBC_PORT
    • OUTPUT_JDBC_DATABASE , or, for Oracle, OUTPUT_JDBC_SERVICE
    • OUTPUT_JDBC_USERNAME
    • OUTPUT_JDBC_PASSWORD

    Construct the OUTPUT_JDBC_CONNECTION_URL using one of the following connector-specific formats:

    • MySQL:
       jdbc:mysql:// OUTPUT_JDBC_HOST 
      : OUTPUT_JDBC_PORT 
      / OUTPUT_JDBC_DATABASE 
      ?user= OUTPUT_JDBC_USERNAME 
      &password= OUTPUT_JDBC_PASSWORD 
       
      
    • PostgreSQL:
       jdbc:postgresql:// OUTPUT_JDBC_HOST 
      : OUTPUT_JDBC_PORT 
      / OUTPUT_JDBC_DATABASE 
      ?user= OUTPUT_JDBC_USERNAME 
      &password= OUTPUT_JDBC_PASSWORD 
       
      
    • Microsoft SQL Server:
       jdbc:sqlserver:// OUTPUT_JDBC_HOST 
      : OUTPUT_JDBC_PORT 
      ;databaseName= OUTPUT_JDBC_DATABASE 
      ;user= OUTPUT_JDBC_USERNAME 
      ;password= OUTPUT_JDBC_PASSWORD 
       
      
    • Oracle:
       jdbc:oracle:thin:@// OUTPUT_JDBC_HOST 
      : OUTPUT_JDBC_PORT 
      / OUTPUT_JDBC_SERVICE 
      ?user= OUTPUT_JDBC_USERNAME 
      &password= OUTPUT_JDBC_PASSWORD 
       
      
  • INPUT_JDBC_TABLE : Required. Input JDBC table name or SQL query on the JDBC input table.

    Example (the SQL query should be within parentheses): (select * from TABLE_NAME ) as ALIAS_TABLE_NAME

  • OUTPUT_JDBC_TABLE : Required. JDBC table where output will be stored.
  • INPUT_DRIVER and OUTPUT_DRIVER : Required. The JDBC input and output driver that is used for the connection:
    • MySQL:
       com.mysql.cj.jdbc.Driver 
      
    • PostgreSQL:
       org.postgresql.Driver 
      
    • Microsoft SQL Server:
       com.microsoft.sqlserver.jdbc.SQLServerDriver 
      
    • Oracle:
       oracle.jdbc.driver.OracleDriver 
      
  • INPUT_PARTITION_COLUMN , LOWERBOUND , UPPERBOUND , NUM_PARTITIONS : Optional. If used, all of the following parameters must be specified:
    • INPUT_PARTITION_COLUMN: JDBC input table partition column name.
    • LOWERBOUND: JDBC input table partition column lower bound used to determine the partition stride.
    • UPPERBOUND: JDBC input table partition column upper bound used to determine the partition stride.
    • NUM_PARTITIONS: The maximum number of partitions that can be used for parallelism of table reads and writes. If specified, this value is used for the JDBC input and output connection.
  • FETCHSIZE : Optional. How many rows to fetch per round trip.
  • BATCH_SIZE : Optional. Number of records to insert per round trip. Default: 1000 .
  • MODE : Optional. Write mode for JDBC output. Options: Append , Overwrite , Ignore , or ErrorIfExists .
  • TABLE_PROPERTIES : Optional. This option allows setting of database-specific table and partition options when creating the output table.
  • PRIMARY_KEY : Optional. Primary key column for output table. Column mentioned should not contain duplicate values, otherwise an error will be thrown.
  • JDBC_SESSION_INIT : Optional. Session initialization statement to read Java templates.
  • LOG_LEVEL : Optional. Level of logging. Can be one of ALL , DEBUG , ERROR , FATAL , INFO , OFF , TRACE , or WARN . Default: INFO .
  • TEMP_VIEW and TEMP_QUERY : Optional. You can use these two optional parameters to apply a Spark SQL transformation while loading data into Cloud Storage. TEMP_VIEW must be the same as table name used in query, and TEMP_QUERY is the query statement.
  • SERVICE_ACCOUNT : Optional. If not provided, the default Compute Engine service account is used.
  • PROPERTY and PROPERTY_VALUE : Optional. Comma-separated list of Spark property = value pairs.
  • LABEL and LABEL_VALUE : Optional. Comma-separated list of label = value pairs.
  • KMS_KEY : Optional. The Cloud Key Management Service key to use for encryption. If a key is not specified, data is encrypted at rest using a Google-owned and Google-managed encryption key.

    Example: projects/ PROJECT_ID /regions/ REGION /keyRings/ KEY_RING_NAME /cryptoKeys/ KEY_NAME

HTTP method and URL:

POST https://dataproc.googleapis.com/v1/projects/ PROJECT_ID 
/locations/ REGION 
/batches

Request JSON body:

{
  "environmentConfig": {
    "executionConfig": {
      "subnetworkUri": " SUBNET 
",
      "kmsKey": " KMS_KEY 
",
      "serviceAccount": " SERVICE_ACCOUNT 
"
    }
  },
  "labels": {
    " LABEL 
": " LABEL_VALUE 
"
  },
  "runtimeConfig": {
    "version": "1.2",
    "properties": {
      " PROPERTY 
": " PROPERTY_VALUE 
"
    }
  },
  "sparkBatch": {
    "mainClass": "com.google.cloud.dataproc.templates.main.DataProcTemplate",
    "args": [
      "--template","JDBCTOJDBC",
      "--templateProperty","log.level= LOG_LEVEL 
",
      "--templateProperty","project.id= PROJECT_ID 
",
      "--templateProperty","jdbctojdbc.input.url= INPUT_JDBC_CONNECTION_URL 
",
      "--templateProperty","jdbctojdbc.input.driver= INPUT_DRIVER 
",
      "--templateProperty","jdbctojdbc.input.table= INPUT_TABLE 
",
      "--templateProperty","jdbctojdbc.output.url= OUTPUT_JDBC_CONNECTION_URL 
",
      "--templateProperty","jdbctojdbc.output.driver= OUTPUT_DRIVER 
",
      "--templateProperty","jdbctojdbc.output.table= OUTPUT_TABLE 
",
      "--templateProperty","jdbctojdbc.input.fetchsize= FETCHSIZE 
",
      "--templateProperty","jdbctojdbc.input.partitioncolumn= INPUT_PARTITION_COLUMN 
",
      "--templateProperty","jdbctojdbc.input.lowerbound= LOWERBOUND 
",
      "--templateProperty","jdbctojdbc.input.upperbound= UPPERBOUND 
",
      "--templateProperty","jdbctojdbc.numpartitions= NUM_PARTITIONS 
",
      "--templateProperty","jdbctojdbc.output.mode= MODE 
",
      "--templateProperty","jdbctojdbc.output.batch.size= BATCH_SIZE 
",
      "--templateProperty","jdbctojdbc.output.primary.key= PRIMARY_KEY 
",
      "--templateProperty","jdbctojdbc.output.create.table.option= TABLE_PROPERTIES 
",
      "--templateProperty","jdbctojdbc.sessioninitstatement= JDBC_SESSION_INIT 
",
      "--templateProperty","jdbctojdbc.temp.view.name= TEMP_VIEW 
",
      "--templateProperty","jdbctojdbc.sql.query= TEMP_QUERY 
"
    ],
    "jarFileUris": [
      "gs://dataproc-templates-binaries/ TEMPLATE_VERSION 
/java/dataproc-templates.jar",
      " INPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH 
",
      " OUTPUT_JDBC_CONNECTOR_CLOUD_STORAGE_PATH 
"
    ]
  }
}

To send your request, expand one of these options:

You should receive a JSON response similar to the following:

{
  "name": "projects/ PROJECT_ID 
/regions/ REGION 
/operations/ OPERATION_ID 
",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.dataproc.v1.BatchOperationMetadata",
    "batch": "projects/ PROJECT_ID 
/locations/ REGION 
/batches/ BATCH_ID 
",
    "batchUuid": "de8af8d4-3599-4a7c-915c-798201ed1583",
    "createTime": "2023-02-24T03:31:03.440329Z",
    "operationType": "BATCH",
    "description": "Batch"
  }
}
Create a Mobile Website
View Site in Mobile | Classic
Share by: