[UA] BigQuery cookbook [Legacy]

You are viewing a legacy article about Universal Analytics. Learn more about Google Analytics 4 replacing Universal Analytics .
This feature is only available in Analytics 360, part of Google Marketing Platform.
Learn more about Google Marketing Platform .

This article contains examples of how to construct queries of the Analytics data you export to BigQuery. We have made available a sample dataset so you can practice with some of the queries in this article.

In this article :

Query optimization

Every query you run contributes to your monthly data-processing allowance. If you select extraneous fields, you increase the amount of data that needs to be processed, and as a result, you use more of your monthly allowance than necessary. Optimised queries make efficient use of your monthly data-processing allowance.

Learn more about pricing.

Select only what you need

When you formulate a query, select the relevant fields within the SELECT statement. By not calling extraneous fields, you reduce the amount of data, and the time needed to process the query.

Example:avoid the use of the wildcard operator

Bad form: using wildcard operator
SELECT*
FROM[table name];
Better form: using field names to avoid unnecessary processing
SELECTfield1, field2
FROM[table name];

Permit caching

Where possible, avoid using functions as fields. Functions (such as NOW() or TODAY() ) return variable results, which prevents queries from being cached and therefore returned more quickly. Instead, use specific times and dates.

Currently, cached results are not supported for queries against multiple tables using a wildcard even if the Use Cached Results option is checked. If you run the same wildcard query multiple times, you are billed for each query. Learn more

Use intermediate tables for commonly used subqueries

If you find that you are repeatedly using a specific query as a subquery, you can save that query as an intermediate table by clicking Save as Tableabove the query results. You can then reference that table in the FROM section of your query, which will reduce both the amount of data that has to be processed and time required for processing.

using an intermediate table
SELECTfield1, field2
FROM[Dataset name.table name];

Query debugging

BigQuery debugs your code as you construct it. In the composition window, debugging is indicated just below the query. Debugging is also available through the API with dryRun flag.

Valid queries have a green indicator that you can click to see the amount of data processed by the query. This feature gives you an opportunity to optimize your data before you run the query so that you can avoid unnecessary data processing.

Query Debugging - Success

Invalid queries have a red indicator that you can click to see information about the error, and find the line and column where the error is occurring. In the example below, the GROUP BY statement is blank, and the error is pinpointed.

Query Debugging - Error

Tips and best practices

Using the sample dataset

The following examples use the Google Analytics sample dataset .

To use the queries on your own data, simply replace the project and dataset names in the examples with your own project and dataset names.

Using Standard SQL vs. Legacy SQL

BigQuery supports two SQL dialects:

Migrating to Standard SQL explains the differences between the two dialects.

Standard SQL is now the preferred SQL dialect for querying data stored in BigQuery.

See Enabling Standard SQL for information about enabling Standard SQL in the BigQuery UI, CLI, API, or whichever interface you are using.

The easiest way to get started is to include the comment "standardSQL" at the top of your Standard SQL queries as shown in the following examples.

With Legacy SQL, Google Analytics 360 data is passed into a new table everyday. To query multiple tables at once, you can comma separate the table names, use the TABLE_DATE_RANGE table wildcard function , or use multiple comma-separated TABLE_DATE_RANGE functions, as in the following examples.

Query multiple tables

The following examples show you Standard SQL and Legacy SQL queries for the same data.

3 days

Standard SQL
3 Days using UNION ALL
#standardSQL
WITHga_tables AS(
SELECT
date,
SUM(totals.visits) ASvisits,
SUM(totals.pageviews) ASpageviews,
SUM(totals.transactions) AStransactions,
SUM(totals.transactionRevenue)/1000000 ASrevenue
FROM`bigquery-public-data.google_analytics_sample.ga_sessions_20160801`
GROUP BYdate

UNION ALL

SELECT
date,
SUM(totals.visits) ASvisits,
SUM(totals.pageviews) ASpageviews,
SUM(totals.transactions) AStransactions,
SUM(totals.transactionRevenue)/1000000 ASrevenue
FROM`bigquery-public-data.google_analytics_sample.ga_sessions_20160802`
GROUP BYdate

UNION ALL

SELECT
date,
SUM(totals.visits) ASvisits,
SUM(totals.pageviews) ASpageviews,
SUM(totals.transactions) AStransactions,
SUM(totals.transactionRevenue)/1000000 ASrevenue
FROM`bigquery-public-data.google_analytics_sample.ga_sessions_20160803`
GROUP BYdate

)
SELECT
date,
visits,
pageviews,
transactions,
revenue,
FROMga_tables
ORDER BYdate ASC
Legacy SQL
3 days using comma-separated table names
SELECT
date,
SUM(totals.visits) ASvisits,
SUM(totals.pageviews) ASpageviews,
SUM(totals.transactions) AStransactions,
SUM(totals.transactionRevenue)/1000000 ASrevenue
FROM
[bigquery-public-data.google_analytics_sample.ga_sessions_20160801],
[bigquery-public-data.google_analytics_sample.ga_sessions_20160802],
[bigquery-public-data.google_analytics_sample.ga_sessions_20160803]
GROUP BY
date
ORDER BY
date ASC

Last 1095 Days

Standard SQL
Last 1095 Days using _TABLE_SUFFIX
#standardSQL
SELECT
date,
SUM(totals.visits) ASvisits,
SUM(totals.pageviews) ASpageviews,
SUM(totals.transactions) AStransactions,
SUM(totals.transactionRevenue)/1000000 ASrevenue
FROM`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1095 DAY))
AND
FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BYdate
ORDER BYdate ASC
Legacy SQL
Last 1095 days using TABLE_DATE_RANGE
SELECT
date,
SUM(totals.visits) ASvisits,
SUM(totals.pageviews) ASpageviews,
SUM(totals.transactions) AStransactions,
SUM(totals.transactionRevenue)/1000000 ASrevenue
FROM
(TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_],
DATE_ADD(CURRENT_TIMESTAMP(), -1095, 'DAY'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY')))
GROUP BY
date
ORDER BY
date ASC

Last 36 Months

Standard SQL
Last 36 Months using _TABLE_SUFFIX
#standardSQL
SELECT
date,
SUM(totals.visits) ASvisits,
SUM(totals.pageviews) ASpageviews,
SUM(totals.transactions) AStransactions,
SUM(totals.transactionRevenue)/1000000 ASrevenue
FROM`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 36 MONTH))
AND
FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BYdate
ORDER BYdate ASC
Legacy SQL
Last 36 months using TABLE_DATE_RANGE
SELECT
date,
SUM(totals.visits) ASvisits,
SUM(totals.pageviews) ASpageviews,
SUM(totals.transactions) AStransactions,
SUM(totals.transactionRevenue)/1000000 ASrevenue
FROM
(TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_],
DATE_ADD(CURRENT_TIMESTAMP(), -36, 'MONTH'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY')))
GROUP BY
date
ORDER BY
date ASC

Last 3 years

Standard SQL
Last 3 Years using _TABLE_SUFFIX
#standardSQL
SELECT
date,
SUM(totals.visits) ASvisits,
SUM(totals.pageviews) ASpageviews,
SUM(totals.transactions) AStransactions,
SUM(totals.transactionRevenue)/1000000 ASrevenue
FROM`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR))
AND
FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BYdate
ORDER BYdate ASC
Legacy SQL
Last 3 years using TABLE_DATE_RANGE
SELECT
date,
SUM(totals.visits) ASvisits,
SUM(totals.pageviews) ASpageviews,
SUM(totals.transactions) AStransactions,
SUM(totals.transactionRevenue)/1000000 ASrevenue
FROM
(TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_],
DATE_ADD(CURRENT_TIMESTAMP(), -3, 'YEAR'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY')))
GROUP BY
date
ORDER BY
date ASC

Specific date range

Standard SQL
Specific date range using _TABLE_SUFFIX
#standardSQL
SELECT
date,
SUM(totals.visits) ASvisits,
SUM(totals.pageviews) ASpageviews,
SUM(totals.transactions) AStransactions,
SUM(totals.transactionRevenue)/1000000 ASrevenue
FROM`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20160801' AND '20170731'
GROUP BYdate
ORDER BYdate ASC
Legacy SQL
Specific date range using TABLE_DATE_RANGE
SELECT
date,
SUM(totals.visits) ASvisits,
SUM(totals.pageviews) ASpageviews,
SUM(totals.transactions) AStransactions,
SUM(totals.transactionRevenue)/1000000 ASrevenue
FROM
(TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2016-08-01'), TIMESTAMP('2017-07-31')))
GROUP BY
date
ORDER BY
date ASC

Last 3 years plus today’s data (intraday)

Standard SQL
Note: this example query will not work with the Google Analytics public dataset because there is currently no intraday table.
#standardSQL
WITHga_tables AS( SELECT
date,
SUM(totals.visits) ASvisits,
SUM(totals.pageviews) ASpageviews,
SUM(totals.transactions) AStransactions,
SUM(totals.transactionRevenue)/1000000 ASrevenue
FROM`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 3 YEAR))
AND
FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BYdate
UNION ALL

SELECT
date,
SUM(totals.visits) ASvisits,
SUM(totals.pageviews) ASpageviews,
SUM(totals.transactions) AStransactions,
SUM(totals.transactionRevenue)/1000000 ASrevenue
FROM`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
AND
FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 0 DAY))
GROUP BYdate
)
SELECT
date,
visits,
pageviews,
transactions,
revenue,
FROMga_tables
ORDER BYdate ASC

Legacy SQL

Note: this example query will not work with the Google Analytics public dataset because there is currently no intraday table.
SELECT
date,
SUM(totals.visits) ASvisits,
SUM(totals.pageviews) ASpageviews,
SUM(totals.transactions) AStransactions,
SUM(totals.transactionRevenue)/1000000 ASrevenue
FROM
(TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_],
DATE_ADD(CURRENT_TIMESTAMP(), -3, 'YEAR'), DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'))),
(TABLE_DATE_RANGE([bigquery-public-data.google_analytics_sample.ga_sessions_intraday_],
DATE_ADD(CURRENT_TIMESTAMP(), -1, 'DAY'), CURRENT_TIMESTAMP()))
GROUP BY
date
ORDER BY
date ASC

Basic query examples

This section explains how to construct basic queries using metrics and dimensions from example Analytics data.

Total [ metric ] per [ dimension ]?

Below are example scripts for the question: What is the total number of transactions generated per device browser in July 2017?

Standard SQL

Total transactions per device browser in July 2017
#standardSQL
SELECT
device.browser,
SUM( totals.transactions ) AS total_transactions
FROM`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
GROUP BY
device.browser
ORDER BY
total_transactions DESC

Legacy SQL

Total transactions per device browser in July 2017
SELECT
device.browser,
SUM( totals.transactions ) AS total_transactions
FROMTABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') )
GROUP BY
device.browser
ORDER BY
total_transactions DESC

Average bounce rate per [ dimension ]?

The real bounce rate is defined as the percentage of visits with a single pageview . Below are example scripts for the question: What was the real bounce rate per traffic source?

Standard SQL

Bounce rate per traffic source in July 2017
#standardSQL
SELECT
source,
total_visits,
total_no_of_bounces,
( ( total_no_of_bounces / total_visits ) * 100 ) AS bounce_rate
FROM(
SELECT
trafficSource.source AS source,
COUNT( trafficSource.source ) AS total_visits,
SUM( totals.bounces ) AS total_no_of_bounces
FROM`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
GROUP BY
source )
ORDER BY
total_visits DESC

Legacy SQL

Bounce rate per traffic source in July 2017
SELECT
source,
total_visits,
total_no_of_bounces,
( ( total_no_of_bounces / total_visits ) * 100 ) AS bounce_rate
FROM(
SELECT
trafficSource.source AS source,
COUNT( trafficSource.source ) AS total_visits,
SUM( totals.bounces ) AS total_no_of_bounces
FROM
TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') )
GROUP BY
source )
ORDER BY
total_visits DESC

Average number of product pageviews by purchaser type (purchasers vs non-purchasers)

Below are example scripts for the question: What was the average number of product pageviews for users who made a purchase in July 2017?

Standard SQL

Average number of product pageviews for users who made a purchase in July 2017
#standardSQL
SELECT
( SUM(total_pagesviews_per_user) / COUNT(users) ) AS avg_pageviews_per_user
FROM (
SELECT
fullVisitorId AS users,
SUM(totals.pageviews) AS total_pagesviews_per_user
FROM`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
AND
totals.transactions >=1
GROUP BY
users )

Legacy SQL

Average number of product pageviews for users who made a purchase in July 2017
SELECT
( SUM(total_pagesviews_per_user) / COUNT(users) ) AS avg_pageviews_per_user
FROM (
SELECT
fullVisitorId AS users,
SUM(totals.pageviews) AS total_pagesviews_per_user
FROM
TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') ) WHERE
totals.transactions >=1
GROUP BY
users )

Below are example scripts for the question: What was the average number of product pageviews for users who did not make a purchase in July 2017?

Standard SQL

Average number of product pageviews for users who did not make a purchase in July 2017
#standardSQL
SELECT
( SUM(total_pagesviews_per_user) / COUNT(users) ) AS avg_pageviews_per_user
FROM(
SELECT
fullVisitorId AS users,
SUM(totals.pageviews) AS total_pagesviews_per_user
FROM`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
AND
totals.transactions IS NULL
GROUP BY
users )

Legacy SQL

Average number of product pageviews for users who did not make a purchase in July 2017
SELECT
( SUM(total_pagesviews_per_user) / COUNT(users) ) AS avg_pageviews_per_user
FROM(
SELECT
fullVisitorId AS users,
SUM(totals.pageviews) AS total_pagesviews_per_user
FROM
TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') ) WHERE
totals.transactions IS NULL
GROUP BY
users )

Average number of transactions per purchaser

Below are example scripts for the question: What was the average total transactions per user that made a purchase in July 2017?

Standard SQL

Average number of transactions per user that made a purchase in July 2017
#standardSQL
SELECT
( SUM(total_transactions_per_user) / COUNT(fullVisitorId) ) AS avg_total_transactions_per_user
FROM(
SELECT
fullVisitorId,
SUM(totals.transactions) AS total_transactions_per_user
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
AND totals.transactions IS NOT NULL
GROUP BY
fullVisitorId )

Legacy SQL

Average number of transactions per user that made a purchase in July 2017
SELECT
( SUM(total_transactions_per_user) / COUNT(fullVisitorId) ) AS avg_total_transactions_per_user
FROM(
SELECT
fullVisitorId,
SUM(totals.transactions) AS total_transactions_per_user
FROM
TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') )
WHERE
totals.transactions IS NOT NULL
GROUP BY
fullVisitorId )

Average amount of money spent per session

Below are example scripts for the question: What is the average amount of money spent per session in July 2017?

Standard SQL

Average amount of money spent per session in July 2017
#standardSQL
SELECT
( SUM(total_transactionrevenue_per_user) / SUM(total_visits_per_user) ) AS
avg_revenue_by_user_per_visit
FROM(
SELECT
fullVisitorId,
SUM( totals.visits ) AS total_visits_per_user,
SUM( totals.transactionRevenue ) AS total_transactionrevenue_per_user
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
AND
totals.visits > 0
AND totals.transactions >= 1
AND totals.transactionRevenue IS NOT NULL
GROUP BY
fullVisitorId )

Legacy SQL

Average amount of money spent per session in July 2017
SELECT
( SUM(total_transactionrevenue_per_user) / SUM(total_visits_per_user) ) AS
avg_revenue_by_user_per_visit
FROM(
SELECT
fullVisitorId,
SUM( totals.visits ) AS total_visits_per_user,
SUM( totals.transactionRevenue ) AS total_transactionrevenue_per_user
FROM
TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') )
WHERE
totals.visits > 0
AND totals.transactions >= 1
AND totals.transactionRevenue IS NOT NULL
GROUP BY
fullVisitorId )

Sequence of hits

Below are example scripts for the question: What is the sequence of pages viewed? .

Standard SQL

Sequence of pages viewed by users in July 2017
#standardSQL
SELECT
fullVisitorId,
visitId,
visitNumber,
hits.hitNumber AS hitNumber,
hits.page.pagePath AS pagePath
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_*`,
UNNEST(hits) as hits
WHERE
_TABLE_SUFFIX BETWEEN '20170701' AND '20170731'
AND
hits.type="PAGE"
ORDER BY
fullVisitorId,
visitId,
visitNumber,
hitNumber

Legacy SQL

Sequence of pages viewed by users in July 2017
SELECT
fullVisitorId,
visitId,
visitNumber,
hits.hitNumber AS hitNumber,
hits.page.pagePath AS pagePath
FROM
TABLE_DATE_RANGE( [bigquery-public-data.google_analytics_sample.ga_sessions_],
TIMESTAMP('2017-07-01'), TIMESTAMP('2017-07-31') )
WHERE
hits.type="PAGE"
ORDER BY
fullVisitorId,
visitId,
visitNumber,
hitNumber

In this query, you limit the hit types to PAGES to avoid seeing event or transaction interactions. Each line of the output represents a pageview, and is shown by the default order of the fields in the SELECT statement.

Multiple custom dimensions at the hit or session level

custom dimension at the hit level
SELECTfullVisitorId, visitId, hits.hitNumber, hits.time,
MAX(IF(hits.customDimensions.index=1,
hits.customDimensions.value,
NULL)) WITHIN hits AS customDimension1,
FROM[tableID.ga_sessions_20150305]
LIMIT100
custom dimension at the session level
SELECTfullVisitorId, visitId,
MAX(IF(customDimensions.index=2,
customDimensions.value,
NULL)) WITHIN RECORD AS customDimension2,
FROM[tableID.ga_sessions_20150305]
LIMIT100

In each query:

The SELECT statement queries for the relevant dimension and metric fields.

The MAX function:

  • Returns the custom dimension as a new column. You can repeat the function to return multiple custom dimensions as new columns.
  • WITHIN hits and WITHIN RECORD evaluate the condition inside repeated fields in BigQuery.
  • The condition inside MAX is evaluated for each custom dimension, but for any that are not index=1 (hits) or index=2 (sessions) , it returns NULL .
  • Returns the maximum value, which is the value of Custom Dimension 1 for hits or Custom Dimension 2 for sessions since all other values are NULL .

Advanced query examples

Now that you’re familiar with simple queries, you can construct queries using the advanced functions and features available in BIgQuery.

Products purchased by customers who purchased product A (Classic Ecommerce)

Below is a skeleton script for the question: What other products are purchased by customers who purchase product A?

products purchased by a customer who purchases product A (Classic Ecommerce)
SELECThits.item.productName AS other_purchased_products, COUNT(hits.item.productName) AS quantity
FROM[‘Dataset Name’ ]
WHEREfullVisitorId IN (
  SELECTfullVisitorId
  FROM[‘Dataset Name’ ]
  WHEREhits.item.productName CONTAINS 'Product Item Name A'
   ANDtotals.transactions>=1
  GROUP BYfullVisitorId
)
 ANDhits.item.productName IS NOT NULL
 ANDhits.item.productName != 'Product Item Name A'
GROUP BYother_purchased_products
ORDER BYquantity DESC;
  1. In the first line, you select all of the other items purchased by a user, and the aggregate function COUNT() is used to calculate the quantity of each other item purchased. The result is then displayed in a field labelled quantity , with their associated item in the product field labelled as other_purchased_products .
  2. In the gray subquery, you select only the unique users ( fullVisitorId ) who have carried out transactions ( totals.transactions>=1 ) and during a transaction purchased product A ( WHERE hits.item.productName CONTAINS 'Product Item Name A' ).

The rules ( WHERE and AND statements) in the top-level (green) query disregard values in hits.item.productName  which are null and contain product A.

Here is an example of the query If a customer purchases Brighton Metallic Pens - Set of 4, what other product(s) did they purchase?

products purchased by a customer who purchases 'Brighton Metallic Pens (Set of 4)' on 24th June 2013
SELECThits.item.productName AS other_purchased_products, COUNT(hits.item.productName) AS quantity
FROM[GoogleStore.ga_sessions_20130624]
WHEREfullVisitorId IN (
  SELECTfullVisitorId
  FROM[GoogleStore.ga_sessions_20130624]
  WHEREhits.item.productName CONTAINS 'Brighton Metallic Pens - Set of 4'
   ANDtotals.transactions>=1
  GROUP BYfullVisitorId
)
 ANDhits.item.productName IS NOT NULL
 ANDhits.item.productName !='Brighton Metallic Pens - Set of 4'
GROUP BYother_purchased_products
ORDER BYquantity DESC;

In Dremel/BigQuery, using WHERE expr IN triggers a JOIN , and size restrictions apply; specifically, the size of the right side of the JOIN (in this case the number of visitors) needs to be less than 8 MB. In Dremel, this is called a broadcast JOIN . When the size exceeds 8 MB, you need to trigger a shuffled JOIN , which can be done using the JOIN EACH syntax. Unfortunately, it can't be done using IN , but the same query can be rewritten with a JOIN .

Products purchased by customers who purchased product A (Enhanced Ecommerce)

This is the similar to the previous skeleton query but works for Enhanced Ecommerce. It also makes use of TABLE_DATE_RANGE to query data over multiple days.

products purchased by a customer who purchases product A (Enhanced Ecommerce)
SELECThits.product.productSKU AS other_purchased_products, COUNT(hits.product.productSKU) AS quantity
FROMSELECT fullVisitorId, hits.product.productSKU, hits.eCommerceAction.action_type FROMTABLE_DATE_RANGE([73156703.ga_sessions_], TIMESTAMP('2015-07-01'), TIMESTAMP('2015-07-14')))
WHEREfullVisitorId IN (
  SELECTfullVisitorId
FROMTABLE_DATE_RANGE([73156703.ga_sessions_], TIMESTAMP('2015-07-01'), TIMESTAMP('2015-07-14'))
WHEREhits.product.productSKU CONTAINS '10 15103'
ANDhits.eCommerceAction.action_type = '6'
GROUP BYfullVisitorId
)
ANDhits.product.productSKU IS NOT NULL
ANDhits.product.productSKU !='10 15103'
ANDhits.eCommerceAction.action_type = '6'
GROUP BYother_purchased_products
ORDER BYquantity DESC;

Average number of user interactions before purchase

This is an example of a query of a JOIN() [...] ON command, which is only dependent on Analytics data.

Below is a skeleton script for the question: What is the average number of user interactions before a purchase?

number of user interactions before a purchase
SELECTone.hits.item.productSku AS ProductSku, ( sum_of_hit_number / total_hits ) AS avg_hit_number
FROM(
SELECThits.item.productSku, SUM(hits.hitNumber) AS sum_of_hit_number
FROM[‘GA Dataset Name’ ]
WHEREhits.item.productSku IS NOT NULL
ANDtotals.transactions>=1
GROUP BYhits.item.productSku
) AS ‘Alias_Name_1’
JOIN(
SELECThits.item.productSku, COUNT( fullVisitorId ) AS total_hits
FROM[‘GA Dataset Name’ ]
WHEREhits.item.productSku IS NOT NULL
ANDtotals.transactions>=1
GROUP BYhits.item.productSku
) AS ‘Alias_Name_2’
ONAlias_Name_1.hits.item.productSku = Alias_Name_2.hits.item.productSku;
  1. The first line carries out the main mathematical operation to find the average user interactions per product, and this query creates a join between two subqueries called ‘ Alias_Name_1 ’ and ‘ Alias_Name_2 ’.
  2. Alias_Name_1 ’ is used to produce a field that uses the aggregate function SUM() to sum all the hit numbers recorded for a product.
  3. Alias_Name_2 ’ is used to find the number of hits made by users per product, using the COUNT() function.
  4. The last line shows the common field ( hits.item.productSku ) shared between the two datasets on the join.

Here is an example of the query On 10th September 2013, what is the average number of user interactions before a purchase?

number of user interactions on 10th September 2013 before a purchase
SELECTone.hits.item.productSku AS ProductSku, ( sum_of_hit_number / total_hits ) AS avg_hit_number
FROM(
SELECThits.item.productSku, SUM(hits.hitNumber) AS sum_of_hit_number
FROM[GoogleStore.ga_sessions_20130728]
WHEREhits.item.productSku IS NOT NULL
ANDtotals.transactions>=1
GROUP BYhits.item.productSku 
) AS one
JOIN(
SELECThits.item.productSku, COUNT( fullVisitorId ) AS total_hits
FROM [GoogleStore.ga_sessions_20130728]
WHERE hits.item.productSku IS NOT NULL
AND totals.transactions>=1
GROUP BYhits.item.productSku
) AS two
ONone.hits.item.productSku = two.hits.item.productSku;

Percentage of stock sold per product

This is an example of a query which is not only dependent on Analytics data, but also on non-Analytics data. By combining both data sets, you can begin to understand user behavior at a more segmented level. You can import non-Analytics data into BigQuery, but keep in mind that this will contribute to your monthly data storage charge.

Below is a skeleton script for the question: What percentage of stock was sold per product?

percentage of stock sold per product
SELECTAnalyticsImport.product_data_20130728.productId, ((( one.quantity_sold ) / ( AnalyticsImport.product_data_20130728.productstock + one.quantity_sold )) * 100 ) AS percentage_of_stock_sold
FROM[ ‘Imported_DataSet’ ]
JOIN(
SELECThits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity_sold
FROM[‘GA Dataset’ ]
WHEREhits.item.productSku IS NOT NULL
ANDtotals.transactions>=1
GROUP BYhits.item.productSku
) AS ‘Alias_Name’
ONImported_DataSet.’productId_field’ = Alias_Name.hits.item.productSku;
  1. The first line outputs two fields: one field containing all the product IDs, and one that is a mathematical operation that shows the percentage of stock sold for that product ID.
  2. Since this query depends on two data sets, you need to use the JOIN() ... ON function. This command joins the rows from the two data sets based on the common field between them. In this case, the two data sets are [ ‘Imported_DataSet’ ] and ‘Alias_Name’ .
  3. [ ‘Imported_DataSet’ ] is the non-Analytics data. This is the data set that contains the metric field for how much stock is left ( Imported DataSet.’stock_left_field’ ) and the product-ID dimension field ( Imported_DataSet.’productId_field’ ).
  4. ‘Alias_Name’ is the name assigned to the data returned by the gray subquery. This subquery is using Analytics data to find out the total quantity of items sold per product.
  5. The last line uses the ON statement to show the common field between the two data sets and where the two datasets are joined.

Many of the variables in this query have their dataset name attached to them as prefixes (e.g., Imported_DataSet.’productId_field’, Alias_Name.quantity_sold ). This is to clarify what field you are selecting, and to make it explicit to which dataset it belongs.

Here is an example of the query What percentage of stock was sold per product on the 28th July 2013?

percentage of stock sold per product on 28th July 2013
SELECTAnalyticsImport.product_data_20130728.productId, ( ( ( one.quantity_sold ) / ( AnalyticsImport.product_data_20130728.productstock + one.quantity_sold ) ) * 100 ) AS percentage_of_stock_sold
FROMAnalyticsImport.product_data_20130728
JOIN(
SELECThits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity_sold
FROMGoogleStore.ga_sessions_20130728
WHEREhits.item.productSku IS NOT NULL
ANDtotals.transactions>=1
GROUP BYhits.item.productSku
) AS one
ONAnalyticsImport.product_data_20130728.productId = one.hits.item.productSku
ORDER BYpercentage_of_stock_sold DESC;

Profitability of each product

Below is a skeleton script for the question: What is the profitability of each product?

profit by product
SELECTAlias_Name.hits.item.productSku, ( Imported_DataSet.’product profit field’ * Alias_Name.quantity ) AS profit
FROM(
SELECTAlias_Name.hits.item.productSku, Imported_DataSet.’product profit field’
FROM[ ‘Imported Data Set’ ]
JOIN(
SELECThits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity
FROM[ ‘GA Dataset Name’ ]
WHEREhits.item.productSku IS NOT NULL
GROUP BYhits.item.productSku
) AS ‘Alias_Name’
ONImported_DataSet.productId = Alias_Name.hits.item.productSku
);
  1. The first line contains the mathematical operation to calculate the total profit made on each product.
  2. The gray subquery uses non-Analytics data, which collects the data about how much profit is made when a product is sold.
  3. The red subquery is the Analytics data subquery, which will be joined with the non-Analytics data. It calculates the quantity of items sold per product.
  4. The last line uses the ON statement to clarify the field that the two data sets share. In this case, it is the product id number.

Here is an example of the query What was the profitability of each product on 28th July 2013?

profit by product on 28th July 2013
SELECTtwo.hits.item.productSku, ((AnalyticsImport.product_data_20130728.productprice-AnalyticsImport.product_data_20130728.productcost)*two.quantity) AS profit
FROM(
SELECTtwo.hits.item.productSku, AnalyticsImport.product_data_20130728.productcost, two.quantity, AnalyticsImport.product_data_20130728.productprice
FROMAnalyticsImport.product_data_20130728
JOIN(
SELECThits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity
FROMGoogleStore.ga_sessions_20130728
WHEREhits.item.productSku IS NOT NULL
GROUP BYhits.item.productSku
) AS two
ONAnalyticsImport.product_data_20130728.productId = two.hits.item.productSku
);

Profit is calculated by finding the difference between the price the product is sold at and the cost of manufacturing it. This information is stored on the non-GA dataset.

Real profitability of each product ( taking into account refunds )

Below is a skeleton script for the question: What is the real profitability of each product?

real profit by product
SELECTAlias_Name.hits.item.productSku, (gross_profit-total_refund_revenue) AS real_profit
FROM(
SELECTAlias_Name.hits.item.productSku, ( ( Imported_DataSet.productprice - Imported_DataSet.productcost ) * Alias_Name.quantity ) AS gross_profit, ( ( Imported_DataSet.refunddeliveryprice + Imported_DataSet.productprice ) * Imported_DataSet.refundquantity ) AS total_refund_revenue
FROM(

SELECTAlias_Name.hits.item.productSku, Imported_DataSet.productcost, Alias_Name.quantity, Imported_DataSet.productprice, Imported_DataSet.refunddeliveryprice, Imported_DataSet.refundquantity
FROM[ ‘Imported DataSet Name’ ] AS 'Imported_DataSet'
JOIN(
SELECThits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity
FROM[‘GA Dataset Name’ ]
WHEREhits.item.productSku IS NOT NULL
ANDtotals.transactions >=1
GROUP BYhits.item.productSku
) AS 'Alias_Name'
ONImported_DataSet.productId = Alias_Name.hits.item.productSku )
);
  1. This is a very similar query to What is the profitability of each product? The only differences are in the non-Analytics dataset in the gray subquery and in the mathematical operation calculating the real profit in the first line.
  2. In the non-Analytics dataset, you are also calculating the total amount of money spent on refunds (in the SELECT statement of the red subquery).
  3. Then you carry out a mathematical operation in line 1 to find the real profit by subtracting the revenue spent on refunds from your gross profit.

For more information on the query, please see the section on the profitability of each product .

Here is an example of the following query What was the real profitability of each product on the 28th July 2013?

real profit by product on 28th July 2013
SELECTtwo.hits.item.productSku, (gross_profit-total_refund_revenue) AS real_profit
FROM(
SELECTtwo.hits.item.productSku, ( ( AnalyticsImport.product_data_20130728.productprice - AnalyticsImport.product_data_20130728.productcost ) * two.quantity ) AS gross_profit, ( ( AnalyticsImport.product_data_20130728.refunddeliveryprice + AnalyticsImport.product_data_20130728.productprice ) * AnalyticsImport.product_data_20130728.refundquantity ) AS total_refund_revenue
FROM(

SELECTtwo.hits.item.productSku, Analytics.product_data_20130728.productcost, two.quantity, AnalyticsImport.product_data_20130728.productprice, AnalyticsImport.product_data_20130728.refunddeliveryprice, AnalyticsImport.product_data_20130728.refundquantity
FROMAnalyticsImport.product_data_20130728
JOIN(

SELECThits.item.productSku, SUM( hits.item.itemQuantity ) AS quantity
FROMGoogleStore.ga_sessions_20130728
WHEREhits.item.productSku IS NOT NULL
ANDtotals.transactions >=1
GROUP BYhits.item.productSku
) AS two
ONAnalyticsImport.product_data_20130728.productId = two.hits.item.productSku )
);

The real profit takes into account the profitability of a product after taking refunded products into consideration. To calculate the total refund revenue for a product:

total refund revenue for a product = ( the price of the product + the refund delivery price for the product ) * the quantity of products refunded

Was this helpful?

How can we improve it?
false
Search
Clear search
Close search
Google apps
Main menu
6745923663404852197
true
Search Help Center
true
true
true
true
true
69256
true
false
false
false
Create a Mobile Website
View Site in Mobile | Classic
Share by: