Use Salesforce batch source plugin to analyze leads data in BigQuery
Learn how to use the Salesforce batch source plugin in Cloud Data Fusion to analyze leads data in BigQuery.
To follow step-by-step guidance for this task directly in the Google Cloud console, click Guide me :
Scenario
Consider a marketing manager planning a highly targeted Email marketing campaign to promote a new product. You have a list of leads in Salesforce Sales Cloud. Before creating a targeted campaign, to understand your target audience better, you want to use the Salesforce batch source plugin in Cloud Data Fusion to extract specific leads data.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Verify that billing is enabled for your Google Cloud project .
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Verify that billing is enabled for your Google Cloud project .
-
Enable the Cloud Data Fusion, BigQuery, Cloud Storage, and Dataproc APIs.
- Create a Cloud Data Fusion instance .
- Configure a connection to the Salesforce API by creating a Salesforce Connected App for Cloud Data Fusion .
Manage permissions
Create and assign the required custom roles and permissions.
Create a custom role and add permissions
-
In the Google Cloud console, go to the Rolespage:
-
Click Create role.
-
In the Titlefield, enter
Custom Role-Tutorial
. -
Click Add permissions.
-
In the Add permissionswindow, select the following permissions and click Add:
-
bigquery.datasets.create
-
bigquery.jobs.create
-
storage.buckets.create
-
-
Click Create.
Assign custom role to the default Compute Engine service account
-
Go to the Cloud Data Fusion Instancespage:
-
Click the name of your instance.
-
Make a note of the default Compute Engine service account. The instance details page contains this information.
The format of the Cloud Data Fusion default Compute Engine service account name is
CUSTOMER_PROJECT_NUMBER -compute@developer.gserviceaccount.com
. -
Go to the IAMpage:
-
In the Filerbar, enter the name of your default Compute Engine service account.
-
For your default Compute Engine service account, click Edit.
-
Click Add another role.
-
In the Select a rolefield, select Custom Role-Tutorial.
-
Click Save.
Configure the Cloud Data Fusion Salesforce batch source plugin
-
Go to the Cloud Data Fusion Instancespage:
-
For your instance, click View instance. The Cloud Data Fusion web interface opens.
-
Go to the Studiopage.
-
Click Hub.
-
In the search bar, enter
Salesforce
. -
Click Salesforce pluginsand then click Deploy.
-
In the Salesforce plugins deploywindow, click Finish.
When the deployment completes, a dialog is displayed with success message.
-
In that dialog, click Create a pipeline.
The Cloud Data Fusion Studio page is displayed.
-
Select Data pipeline - batchas the type of your data pipeline.
-
In the Sourcemenu, and click Salesforce.
-
Go to the Salesforcenode and click Properties. This opens the Salesforce plugin properties page.
-
In the Reference namefield, enter a name for your source. For example,
Leads_generated
. -
In the Connectionsection, click the Use connectiontoggle.
-
Click Browse connections. The Browse connectionswindow opens.
-
Click Add connectionand select Salesforce.
-
In the Create a Salesforce connectionwindow, click the Configurationtab and do the following:
-
In the Namefield, enter a name to identify your connection—for example,
Salesforce_connection
. -
In the Credentialssection, enter the following details of your Salesforce account:
- User name
- Password
- Consumer key
- Consumer secret
- Security token
-
Click Test connection. If the entered details are correct, the test succeeds with a message "Successfully connected".
-
Click Create.
-
Select Salesforce_connectionand go back to Salesforce plugin properties page.
-
Extract data from the Salesforce batch source plugin
-
On the Salesforce plugin properties page, in the SOQL querysection, enter the following query:
Select LastName , FirstName , Company , Email , Phone , LeadSource , Industry , OwnerId , CreatedDate , LastModifiedDate , LastActivityDate from Lead where Status like '%Open%'
This query fetches the details of a potential lead required to run a campaign from the sObject
Lead
. -
To determine the validity of the object schema, click Get schema.
-
To filter the records for a specific date or time for running the campaign, use the following fields:
- Last modified after
- Last modified before
- Duration
- Offset

Transform data using Wrangler plugin
Use the Wrangler plugin in Cloud Data Fusion to clean and enrich your data:
-
Go back to the Studiopage.
-
In the Transformmenu, and click Wrangler.
-
Connect the Wrangler to the Salesforce batch source plugin.
-
Go to the Wranglerplugin, and click Properties. This opens the Wrangler plugin properties page.
-
Ensure that the Input schemais populated.
-
Click Wrangle.
-
In the Connectionspane, select a valid connection.
-
Select the sObject you want to transform—for example,
Lead
. -
Transform the data with the required directives:
keep :LastName,:FirstName,:Company,:Phone,:Email,:LeadSource,:OwnerId, :CreatedDate,:LastModifiedDate,:LastActivityDatemerge :FirstName :LastName :Name ' ' fill-null-or-empty :Email 'no email found' mask-number :Phone ########xxxxxxxx format-date :LastActivityDate yyyy-MM-dd HH:mm:ss drop :LastName,:FirstName
Load data into BigQuery
-
Go back to the Studiopage.
-
In the Sinkmenu, and click BigQuery.
-
Go to the BigQuerynode, and click Properties. This opens the BigQuery plugin properties page.
-
In the Basicsection, in the Reference namefield, enter a name to identify this sink. For example,
Leads_generated
. -
In the Datasetfield, enter the dataset the table belongs to. For example,
Salesforce_Leads
. -
In the Tablefield, enter the table in which the extracted records need to be stored. For example,
Incoming_Open_Leads
. -
To validate the plugin, click Validate.

Deploy, schedule, and run the pipeline
-
To deploy the pipeline, click Deploy.
-
To set up an appropriate refresh schedule using schedulers, follow these steps:
- Click Schedule.
-
Enter the following details:
- Pipeline run repeats
- Repeats every
- Starting at
- Max concurrent runs
- Compute profiles
-
Click Save and start schedule.
-
To run the pipeline, click Run.
Verify the data extraction and ingestion
-
In the Google Cloud console, go to the BigQuerypage:
-
Search for the dataset
Salesforce_Leads
and the table nameIncoming_Open_Leads
to view the extracted records. -
To run the query, click Query.
Analyse the leads data to understand your audience better and to deliver tailored campaigns at scale.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
Delete the Cloud Data Fusion instance
Follow these instructions to delete your Cloud Data Fusion instance .
Delete the project
The easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete .
- In the dialog, type the project ID, and then click Shut down to delete the project.