Build ELT pipeline for marketing analytics data
This tutorial shows you how to set up an ELT workflow that extracts, loads, and transforms marketing analytics data in BigQuery.
A typical ELT workflow periodically extracts new customer data from your data source and loads it into BigQuery. The unstructured data is then processed into meaningful metrics. In this tutorial, you create an ELT workflow by setting up a marketing analytics data transfer by using the BigQuery Data Transfer Service. Then, you schedule Dataform to run periodic transformations on the data.
In this tutorial, you use Google Ads as your data source, but you can use any of the data sources supported by the BigQuery Data Transfer Service .
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
-
Verify that billing is enabled for your Google Cloud project .
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Roles required to select or create a project
- Select a project : Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project
: To create a project, you need the Project Creator role
(
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission. Learn how to grant roles .
-
Verify that billing is enabled for your Google Cloud project .
Required roles
To get the permissions that you need to complete this tutorial, ask your administrator to grant you the following IAM roles on the project:
- BigQuery Admin
(
roles/bigquery.admin) - Dataform Admin
(
roles/dataform.admin)
For more information about granting roles, see Manage access to projects, folders, and organizations .
You might also be able to get the required permissions through custom roles or other predefined roles .
Schedule recurring data transfers
To keep BigQuery up to date with the latest marketing data from your data source, set up recurring data transfers using the BigQuery Data Transfer Service to extract and load data on a schedule.
In this tutorial, you use Google Ads as the example data source. For a full list of data sources supported by the BigQuery Data Transfer Service, see Supported data sources .
-
Go to the Data transfers page in the Google Cloud console.
-
Click Create transfer.
-
In the Source typesection, for Source, choose Google Ads.
-
In the Data source detailssection:
- For Customer ID, enter your Google Ads customer ID.
- For Report type, select Standard. The standard report includes
the standard set of reports and fields as detailed in Google Ads report
transformation
.
- For Refresh window, enter
5.
- For Refresh window, enter
-
In the Destination settingssection, for Dataset, select the dataset that you created to store your data.
-
In the Transfer config namesection, for Display name, enter
Marketing tutorial. -
In the Schedule optionssection:
- For Repeat frequency, select Days.
- For At, enter
08:00.
-
Click Save.
After you save the configuration, the BigQuery Data Transfer Service begins the data transfer. Based on the settings in the transfer configuration, the data transfer runs once every day at 8:00 AM UTC and extracts data from Google Ads from the past five days.
You can monitor ongoing transfer jobs to check the status of each data transfer.
Query table data
When your data is transferred to BigQuery, the data is written to ingestion-time partitioned tables. For more information, see Introduction to partitioned tables .
If you query your tables directly instead of using the auto-generated views, you
must use the _PARTITIONTIME
pseudocolumn in your query. For more information,
see Querying partitioned tables
.
The following sections show sample queries that you can use to examine your transferred data.
Campaign performance
The following sample query analyzes Google Ads campaign performance for the past 30 days.
Console
SELECT c . customer_id , c . campaign_name , c . campaign_status , SUM ( cs . metrics_impressions ) AS Impressions , SUM ( cs . metrics_interactions ) AS Interactions , ( SUM ( cs . metrics_cost_micros ) / 1000000 ) AS Cost FROM ` DATASET .ads_Campaign_ CUSTOMER_ID ` c LEFT JOIN ` DATASET .ads_CampaignBasicStats_ CUSTOMER_ID ` cs ON ( c . campaign_id = cs . campaign_id AND cs . _DATA_DATE BETWEEN DATE_ADD ( CURRENT_DATE (), INTERVAL - 31 DAY ) AND DATE_ADD ( CURRENT_DATE (), INTERVAL - 1 DAY )) WHERE c . _DATA_DATE = c . _LATEST_DATE GROUP BY 1 , 2 , 3 ORDER BY Impressions DESC
bq
bq query --use_legacy_sql = false ' SELECT c.customer_id, c.campaign_name, c.campaign_status, SUM(cs.metrics_impressions) AS Impressions, SUM(cs.metrics_interactions) AS Interactions, (SUM(cs.metrics_cost_micros) / 1000000) AS Cost FROM ` DATASET .ads_Campaign_ CUSTOMER_ID ` c LEFT JOIN ` DATASET .ads_CampaignBasicStats_ CUSTOMER_ID ` cs ON (c.campaign_id = cs.campaign_id AND cs._DATA_DATE BETWEEN DATE_ADD(CURRENT_DATE(), INTERVAL -31 DAY) AND DATE_ADD(CURRENT_DATE(), INTERVAL -1 DAY)) WHERE c._DATA_DATE = c._LATEST_DATE GROUP BY 1, 2, 3 ORDER BY Impressions DESC'
Replace the following:
-
DATASET: the name of the dataset that you created to store the transferred table -
CUSTOMER_ID: your Google Ads Customer ID.
Count of keywords
The following sample query analyzes keywords by campaign, ad group, and keyword
status. This query uses the KeywordMatchType
function. Keyword match types
help control which searches can trigger your ad. For more information about keyword
matching options, see About keyword matching options
.
Console
SELECT c . campaign_status AS CampaignStatus , a . ad_group_status AS AdGroupStatus , k . ad_group_criterion_status AS KeywordStatus , k . ad_group_criterion_keyword_match_type AS KeywordMatchType , COUNT ( * ) AS count FROM ` DATASET .ads_Keyword_ CUSTOMER_ID ` k JOIN ` DATASET .ads_Campaign_ CUSTOMER_ID ` c ON ( k . campaign_id = c . campaign_id AND k . _DATA_DATE = c . _DATA_DATE ) JOIN ` DATASET .ads_AdGroup_ CUSTOMER_ID ` a ON ( k . ad_group_id = a . ad_group_id AND k . _DATA_DATE = a . _DATA_DATE ) WHERE k . _DATA_DATE = k . _LATEST_DATE GROUP BY 1 , 2 , 3 , 4
bq
bq query --use_legacy_sql = false ' SELECT c.campaign_status AS CampaignStatus, a.ad_group_status AS AdGroupStatus, k.ad_group_criterion_status AS KeywordStatus, k.ad_group_criterion_keyword_match_type AS KeywordMatchType, COUNT(*) AS count FROM ` DATASET .ads_Keyword_ CUSTOMER_ID ` k JOIN ` DATASET .ads_Campaign_ CUSTOMER_ID ` c ON (k.campaign_id = c.campaign_id AND k._DATA_DATE = c._DATA_DATE) JOIN ` DATASET .ads_AdGroup_ CUSTOMER_ID ` a ON (k.ad_group_id = a.ad_group_id AND k._DATA_DATE = a._DATA_DATE) WHERE k._DATA_DATE = k._LATEST_DATE GROUP BY 1, 2, 3, 4'
Replace the following:
-
DATASET: the name of the dataset that you created to store the transferred table -
CUSTOMER_ID: your Google Ads Customer ID.
Create a Dataform repository
After you create the data transfer configuration to transfer the latest data from Google Ads, set up Dataform to regularly transform your marketing analytics data. Dataform lets you schedule regular data transformations, and it lets you define these transformations with SQL while collaborating with other data analysts.
Create a Dataform repository to store the SQLX queries that make up your transformation code.
-
In the Google Cloud console, go to the Dataformpage.
-
Click Create repository.
-
On the Create repositorypage, do the following:
- In the Repository IDfield, enter
marketing-tutorial-repository. - In the Regionlist, select a region.
- Click Create.
- In the Repository IDfield, enter
The marketing-tutorial-repository
repository now appears in your
Dataform repositories list.
For more information about Dataform repositories, see About Dataform repositories .
Create and initialize a Dataform development workspace
Create a Dataform development workspace so that you can work on the transformation code within your repository before you commit and push your changes to your repository.
-
In the Google Cloud console, go to the Dataformpage.
-
Click
marketing-tutorial-repository. -
Click Create development workspace.
-
In the Create development workspacewindow, do the following:
- In the Workspace IDfield, enter
marketing-tutorial-workspace. - Click Create.
The development workspace page appears.
- In the Workspace IDfield, enter
-
Click Initialize workspace.
The marketing-tutorial-workspace
development workspace now appears in your marketing-tutorial-repository
repository under the Development Workspacestab, along with two example files in the definitions
directory called *first_view.sqlx
and *second_view.sqlx
.
For more information about Dataform development workspaces, see Overview of development workspaces .
Declare your Google Ads table as table source
Connect your newly-transferred Google Ads table to Dataform by declaring it as a data source by following these following steps:
Create a SQLX file for data source declaration
In Dataform, you declare a data source destination by creating
a SQLX file in the definitions/
directory:
-
In the Google Cloud console, go to the Dataformpage.
-
Select
marketing-tutorial-repository. -
Select
marketing-tutorial-workspace. -
In the Filespane, next to
definitions/, click the Moremenu. -
Click Create file.
-
In the Create new filepane, do the following:
- In the Add a file pathfield, after
definitions/, enter the namedefinitions/googleads-declaration.sqlx. - Click Create file.
- In the Add a file pathfield, after
Declare a data source
Edit the definitions/googleads-declaration.sqlx
to declare a transferred
Google Ads table as a data source. This example declares the ads_Campaign
table as a data source:
- In your development workspace, in the Filespane, click your SQLX file for data source declaration.
-
In the file, enter the following code snippet:
config { type : "declaration" , database : " PROJECT_ID " , schema : " DATASET " , name : "ads_Campaign_ CUSTOMER_ID " , }
Define your transformation
Define your data transformations by create a SQLX file in the definitions/
directory. In this tutorial, you create a daily transformation that aggregates
metrics like clicks, impressions, costs, and conversions using a file named daily_performance.sqlx
.
Create the transformation SQLX file
- In the Filespane, next to
definitions/, click the Moremenu, and then select Create file. - In the Add a file pathfield, enter
definitions/daily_performance.sqlx. - Click Create file.
Define the transformation SQLX file
- In the Filespane, expand the
definitions/directory. -
Select
daily_performance.sqlx, then enter the following query:config { type : "table" , schema : "reporting" , tags : [ "daily" , "google_ads" ] } SELECT date , campaign_id , campaign_name , SUM ( clicks ) AS total_clicks FROM `ads_Campaign_ CUSTOMER_ID ` GROUP BY date , campaign_id , campaign_name ORDER BY date DESC
Commit and push your changes
After you have made your changes in your development workspace, you can commit and push these changes to your repository by following these steps:
- In the
marketing-tutorial-workspaceworkspace, click Commit 1 change. - In the New commitpane, enter a commit description in the Add a commit messagefield.
- Click Commit all changes.
- In the
marketing-tutorial-workspaceworkspace, click Push to default branch.
After your changes are successfully pushed to your repository, the Workspace is up to datemessage appears.
Schedule your data transformation
After you have defined your data transformation file, schedule the data transformations.
Create a production release
A production release in Dataform ensures that your environment is
consistently updated with the results of your data transformations. The following
steps show you how to specify the main
branch of the marketing-tutorial-repository
repository to store your data transformations:
-
In the Google Cloud console, go to the Dataformpage.
-
Select
marketing-tutorial-repository. -
Click the Releases & schedulingtab.
-
Click Create production release.
-
In the Create release configurationpane, configure the following settings:
- In the Release IDfield, enter
transformations. - In the Git commitishfield, leave the default value
main. - In the Schedule frequencysection, select On-demand.
- In the Release IDfield, enter
-
Click Create.
Create a workflow configuration
Once you have created a production release, you can then create a workflow
configuration that runs your data transformations on a specified schedule
in your repository. The following steps show you how to schedule daily
transformations from the transformations
file:
-
In the Google Cloud console, go to the Dataformpage.
-
Select
marketing-tutorial-repository. -
Click the Releases & schedulingtab.
-
In the Workflow configurationssection, click Create.
-
In the Create workflow configurationpane, in the Configuration IDfield, enter
transformations. -
In the Release configurationmenu, select
transformations. -
Under Authentication, select Execute with user credentials
-
In the Schedule frequencysection, do the following:
1. Select ** Repeat ** . 1. For ** Repeats ** , select `Daily` . 1. For ** At time ** , enter `10:00 AM` . 1. For ** Timezone ** , select `Coordinated Universal Time (UTC)` . -
Click Selection of tags.
-
In the Select tags to executefield, select Daily.
-
Click Create.
The workflow configuration that you have created runs the entire latest
compilation result created by the transformations
release configuration.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
Delete the dataset created in BigQuery
To avoid incurring charges for BigQuery assets, delete the
dataset called dataform
.
-
In the Google Cloud console, go to the BigQuerypage.
-
In the Explorerpanel, expand your project and select
dataform. -
Click the Actionsmenu, and then select Delete.
-
In the Delete datasetdialog, enter
deleteinto the field, and then click Delete.
Delete the Dataform development workspace and configurations
Dataform development workspace creation incurs no costs, but to delete the development workspace you can follow these steps:
-
In the Google Cloud console, go to the Dataformpage.
-
Click
quickstart-repository. -
Click the Release & schedulingtab.
-
Under the Release configurationssection, click the Moremenu next to the
productionconfiguration, and then click Delete. -
Under the Workflow configurationssection, click the Moremenu next to the
transformationsconfiguration, and then click Delete. -
In the Development workspacestab, click the Moremenu by
quickstart-workspace, and then select Delete. -
To confirm, click Delete.
Delete the Dataform repository
Dataform repository creation incurs no costs, but to delete the repository you can follow these steps:
-
In the Google Cloud console, go to the Dataformpage.
-
By
quickstart-repository, click the Moremenu, and then select Delete. -
In the Delete repositorywindow, enter the name of the repository to confirm deletion.
-
To confirm, click Delete.

