This page describes how to load data from Salesforce sObjects into Google Cloud with Cloud Data Fusion. The Salesforce batch source plugin reads sObjects from Salesforce. The sObjects are the Salesforce tables that you plan to pull from. Some examples of sObjects include opportunities, contacts, accounts, leads, and custom objects.
The Salesforce batch source plugin supports reading data with Salesforce Object Query Language (SOQL) queries and incremental or range date filters.
Before you begin
- The Cloud Data Fusion Salesforce batch source uses the Salesforce Bulk API V1.
- Set up Salesforce before deploying and using Cloud Data Fusion Salesforce plugin. For more information, see Create a Salesforce connected app for Cloud Data Fusion .
Configure the plugin
- Go to the Cloud Data Fusion web interface and click Studio.
- Check that Data Pipeline - Batchis selected (not Realtime).
- In the Sourcemenu, click Salesforce. The Salesforce node appears in your pipeline. If you don't see the Salesforce source on the Studiopage, deploy the Salesforce plugins from the Cloud Data Fusion Hub .
- To configure the source, go to the Salesforce node and click Properties.
-
Enter the following properties. For a complete list, see Properties .
- Enter a Labelfor the Salesforce node—for example,
Salesforce tables. - Enter a Reference namefor the Salesforce source for
lineage—for example,
Salesforce tables. -
Enter the connection details. You can set up a new, one-time connection, or an existing, reusable connection.
New connection
To add a one-time connection to Salesforce, follow these steps:
- Keep Use connectionturned off.
-
In the Connectionsection, enter the following information from the Salesforce account in these fields:
- Username
- Password
- Security token
- Consumer key
- Consumer secret
To get the credentials from Salesforce, see Get properties from Salesforce .
Reusable connection
To reuse an existing connection, follow these steps:
- Turn on Use connection.
- Click Browse connections.
-
Click the connection name.
-
Optional: If a connection doesn't exist and you want to create a new reusable connection, click Add connectionand refer to the steps in the New connection tab.
-
Enter the SObject nameto load all the columns from the object.
-
Optional: If you select the sObject name, you can filter the data with the following fields:
- Last modified after: Only include records that were last modified after a given time.
- Last modified before: Only include records that were last modified earlier than a given time.
- Duration: Only include records that were last modified within a time window of the specified size.
- Offset: Only include records where the Last modified date is less than the logical start time of the pipeline, minus the given offset.
-
Optional: For supported sObjects, to improve the performance of a pipeline, turn on Enable PK chunking. For more information, see Improve performance with PK chunking .
-
Optional: Instead of specifying the sObject Name, you can enter a SOQL query, such as
SELECT LastName from Contact. For more information, see SOQL queries for the Salesforce source . -
To test connectivity, click Get schema. Cloud Data Fusion connects to Salesforce and pulls the schema for the listed table (technically, an sObject).
- Enter a Labelfor the Salesforce node—for example,
Properties
The default is
https://login.salesforce.com/services/oauth2/token
Default is 30000 milliseconds.
Examples:
-
SELECT Id, Name, BillingCity FROM Account -
SELECT Id FROM Contact WHERE Name LIKE 'A%' AND MailingCity = 'California'
select FIELD_1
, FIELD_2
from
${sObjectName}
. Ignored if SOQL query is provided.There are also sObjects that aren't supported in the Salesforce Bulk API. Creating a job with an unsupported object causes the following error:
Entity is not supported by the Bulk
API
. These objects also aren't supported by Einstein Analytics,
which uses Bulk API for querying data.Cases when BULK API isn't used:When query length sends the query to Salesforce to receive the array of batch information, the Bulk API isn't used in one case. If the query is within the limit, it executes the original query. Otherwise, it switches to wide object logic. For example, it generates an ID query that only retrieves batch information for IDs that will be used later for retrieving data through the SOAP API.
LastModifiedDate
, is greater than or equal to the specified
date. Use the Salesforce date format (see examples
). If no value is provided,
no lower-bound date is applied.LastModifiedDate
, is less than the specified date. Use the
Salesforce date format
(see examples
). Specifying this value
with the Last modified afterproperty lets you read data
that was modified within a window of time. If no value is provided, no
upper-bound date is applied.6 hours
and the pipeline runs at 9 AM, it will read data
that was last updated from 3 AM (inclusive) to 9 AM (exclusive).
Specify the duration with numbers and the following time units: -
seconds -
minutes -
hours -
days -
months -
years
2 days, 1 hours, 30 minutes
. If a value is already
specified for Last modified afteror Last modified before, the duration is ignored.LastModifiedDate
, is less than the logical start time of the
pipeline, minus the given offset. For example, if duration is 6 hours
, the offset is 1 hours
, and the pipeline
runs at 9 AM, data that was last modified between 2 AM
(inclusive) and 8 AM (exclusive) is read.Specify the duration using numbers and the following units of time:
-
seconds -
minutes -
hours -
days -
months -
years
2 days, 1 hours, 30 minutes
. If a value is already
specified for Last modified afteror Last modified before, the offset is ignored.queryAll
returns all current and deleted records.The default operation is
query
.Salesforce recommends that you enable PK chunking when querying tables with more than 10 million records, or when a bulk query constantly times out. For more information, see PK chunking . PK chunking only works with queries that don't include
SELECT
clauses, or conditions other than WHERE
.
Chunking is supported for custom objects and any Sharing and History tables
that support standard objects.Salesforce date format examples
| Format syntax | Example |
|---|---|
YYYY-MM-DDThh:mm:ss+hh:mm
|
1999-01-01T23:01:01+01:00
|
YYYY-MM-DDThh:mm:ss-hh:mm
|
1999-01-01T23:01:01-08:00
|
YYYY-MM-DDThh:mm:ssZ
|
1999-01-01T23:01:01Z
|
Data type mappings
The following table is a list of Salesforce data types with corresponding CDAP types.
| Salesforce data type | CDAP schema data type |
|---|---|
_bool
|
boolean
|
_int
|
int
|
_long
|
long
|
_double
, currency
, percent
, geolocation
(latitude), geolocation
(longitude) |
double
|
date
|
date
|
datetime
|
timestamp
(microseconds) |
time
|
time
(microseconds) |
picklist
|
string
|
multipicklist
|
string
|
combobox
|
string
|
reference
|
string
|
base64
|
string
|
textarea
|
string
|
phone
|
string
|
id
|
string
|
url
|
string
|
email
|
string
|
encryptedstring
|
string
|
datacategorygroupreference
|
string
|
location
|
string
|
address
|
string
|
anyType
|
string
|
json
|
string
|
complexvalue
|
string
|
Use cases
See the following use cases for the Salesforce batch source:
Prerequisites for upgrading to the Salesforce plugin version 1.7.0
The Salesforce plugin version 1.7.0 is available in Cloud Data Fusion version 6.8.0 and later. This release includes the upgrade of Salesforce Bulk API V1 version from 62.0 to 64.0.
Salesforce has deprecated certain fields in the API version 64.0. Upgrading to the Salesforce plugin version 1.7.0 might cause pipelines that use these fields to fail. To ensure your pipelines continue to work, manually update your pipeline schema to either load a new schema or remove the deprecated fields.
Deprecated fields in Salesforce API version 64.0
Salesforce has deprecated the following fields in the API version 64.0:
| Object name | Field names |
|---|---|
| MobSecurityCertPinConfig | EffectiveDate |
| Profile | PermissionsAllowObjectDetectionTraining PermissionsAllowObjectDetection |
| PermissionSet | PermissionsAllowObjectDetectionTraining PermissionsAllowObjectDetection |
| PermissionSetLicense | MaximumPermissionsAllowObjectDetectionTraining MaximumPermissionsAllowObjectDetection |
| UserPermissionAccess | PermissionsAllowObjectDetectionTraining PermissionsAllowObjectDetection |
| MutingPermissionSet | PermissionsAllowObjectDetectionTraining PermissionsAllowObjectDetection |
| Domain | OptionsHstsHeaders |
| User | UserPreferencesHideInvoicesRedirectConfirmation UserPreferencesHideStatementsRedirectConfirmation |
| UserChangeEvent | UserPreferencesHideInvoicesRedirectConfirmation UserPreferencesHideStatementsRedirectConfirmation |
Update pipeline schema
If your pipeline uses any deprecated fields , then update the pipeline schema by either loading a fresh schema or by removing the deprecated fields:
-
Go to the Cloud Data Fusion web interface and click List.
-
To view the deployed pipelines, click the Deployedtab.
-
For the pipeline you want to update, click > Edit.
-
In the Studioview, hold the pointer over the Salesforce source plugin, and click Properties.
The configuration window for the Salesforce plugin opens.
-
To load a new schema, follow these steps:
-
In the Output schematab, click > Clear, and then click Validate. This loads the latest schema from Salesforce.
-
After the schema loads, in the Output schematab, click > Propagate.
This automatically updates the schema for all subsequent plugins in the pipeline, which might overwrite manual schema changes in plugins like Wrangler.
-
-
To manually remove the deprecated fields, follow this step:
- In the Output schematab, for each of the deprecated fields, click . This removes the fields from all subsequent stages.
Best practices
For more information about improving performance in the Salesforce batch source, see the best practices .
Release notes
What's next
- Work through a Salesforce plugin tutorial .

