The advanced queries in this page apply to the BigQuery event export data for Google Analytics. See BigQuery cookbook for Universal Analytics if you are looking for the same resource for Universal Analytics. Try the basic queries first before trying out the advanced ones.
Products purchased by customers who purchased a certain product
The following query shows what other products were purchased by customers who purchased a specific product. This example does not assume that the products were purchased in the same order.
The optimized example relies on BigQuery scripting features to define a variable
that declares which items to filter on. While this does not improve performance,
this is a more readable approach for defining variables compared creating a
single value table using a WITH
clause. The simplified query uses the latter
approach using the WITH
clause.
The simplified query creats a separate list of "Product A buyers" and does a
join with that data. The optimized query, instead, creates a list of all items a
user has purchased across orders using the ARRAY_AGG
function. Then using the
outer WHERE
clause, purchase lists across all users are filtered for the target_item
and only relevant items are shown.
Simplified
-- Example: Products purchased by customers who purchased a specific product.
--
-- `Params` is used to hold the value of the selected product and is referenced
-- throughout the query.
WITH
Params
AS
(
-- Replace with selected item_name or item_id.
SELECT
'Google Navy Speckled Tee'
AS
selected_product
),
PurchaseEvents
AS
(
SELECT
user_pseudo_id
,
items
FROM
-- Replace table name.
`
bigquery
-
public
-
data
.
ga4_obfuscated_sample_ecommerce
.
events_
*`
WHERE
-- Replace date range.
_TABLE_SUFFIX
BETWEEN
'20201101'
AND
'20210131'
AND
event_name
=
'purchase'
),
ProductABuyers
AS
(
SELECT
DISTINCT
user_pseudo_id
FROM
Params
,
PurchaseEvents
,
UNNEST
(
items
)
AS
items
WHERE
-- item.item_id can be used instead of items.item_name.
items
.
item_name
=
selected_product
)
SELECT
items
.
item_name
AS
item_name
,
SUM
(
items
.
quantity
)
AS
item_quantity
FROM
Params
,
PurchaseEvents
,
UNNEST
(
items
)
AS
items
WHERE
user_pseudo_id
IN
(
SELECT
user_pseudo_id
FROM
ProductABuyers
)
-- item.item_id can be used instead of items.item_name
AND
items
.
item_name
!=
selected_product
GROUP
BY
1
ORDER
BY
item_quantity
DESC
;
Optimized
-- Optimized Example: Products purchased by customers who purchased a specific product.
-- Replace item name
DECLARE
target_item
STRING
DEFAULT
'Google Navy Speckled Tee'
;
SELECT
IL
.
item_name
AS
item_name
,
SUM
(
IL
.
quantity
)
AS
quantity
FROM
(
SELECT
user_pseudo_id
,
ARRAY_AGG
(
STRUCT
(
item_name
,
quantity
))
AS
item_list
FROM
-- Replace table
`
bigquery
-
public
-
data
.
ga4_obfuscated_sample_ecommerce
.
events_
*`
,
UNNEST
(
items
)
WHERE
-- Replace date range
_TABLE_SUFFIX
BETWEEN
'20201201'
AND
'20201210'
AND
event_name
=
'purchase'
GROUP
BY
1
),
UNNEST
(
item_list
)
AS
IL
WHERE
target_item
IN
(
SELECT
item_name
FROM
UNNEST
(
item_list
))
-- Remove the following line if you want the target_item to appear in the results
AND
target_item
!=
IL
.
item_name
GROUP
BY
item_name
ORDER
BY
quantity
DESC
;
Average amount of money spent per purchase session by user
The following query shows the average amount of money spent per session by each user. This takes into account only the sessions where the user made a purchase.
-- Example: Average amount of money spent per purchase session by user.
WITH
events
AS
(
SELECT
session
.
value
.
int_value
AS
session_id
,
COALESCE
(
spend
.
value
.
int_value
,
spend
.
value
.
float_value
,
spend
.
value
.
double_value
,
0
.
0
)
AS
spend_value
,
event
.
*
-- Replace table name
FROM
`
bigquery
-
public
-
data
.
ga4_obfuscated_sample_ecommerce
.
events_
*`
AS
event
LEFT
JOIN
UNNEST
(
event
.
event_params
)
AS
session
ON
session
.
key
=
'ga_session_id'
LEFT
JOIN
UNNEST
(
event
.
event_params
)
AS
spend
ON
spend
.
key
=
'value'
-- Replace date range
WHERE
_TABLE_SUFFIX
BETWEEN
'20201101'
AND
'20210131'
)
SELECT
user_pseudo_id
,
COUNT
(
DISTINCT
session_id
)
AS
session_count
,
SUM
(
spend_value
)
/
COUNT
(
DISTINCT
session_id
)
AS
avg_spend_per_session_by_user
FROM
events
WHERE
event_name
=
'purchase'
and
session_id
IS
NOT
NULL
GROUP
BY
user_pseudo_id
Latest Session Id and Session Number for users
The following query provides the list of the latest ga_session_id and
ga_session_number from last 4 days for a list of users. You can provide either a user_pseudo_id
list or a user_id
list.
user_pseudo_id
-- Get the latest ga_session_id and ga_session_number for specific users during last 4 days.
-- Replace timezone. List at https://en.wikipedia.org/wiki/List_of_tz_database_time_zones.
DECLARE
REPORTING_TIMEZONE
STRING
DEFAULT
'America/Los_Angeles'
;
-- Replace list of user_pseudo_id's with ones you want to query.
DECLARE
USER_PSEUDO_ID_LIST
ARRAY<STRING>
DEFAULT
[
'1005355938.1632145814'
,
'979622592.1632496588'
,
'1101478530.1632831095'
];
CREATE
TEMP
FUNCTION
GetParamValue
(
params
ANY
TYPE
,
target_key
STRING
)
AS
(
(
SELECT
`
value
`
FROM
UNNEST
(
params
)
WHERE
key
=
target_key
LIMIT
1
)
);
CREATE
TEMP
FUNCTION
GetDateSuffix
(
date_shift
INT64
,
timezone
STRING
)
AS
(
(
SELECT
FORMAT_DATE
(
'%Y%m%d'
,
DATE_ADD
(
CURRENT_DATE
(
timezone
),
INTERVAL
date_shift
DAY
)))
);
SELECT
DISTINCT
user_pseudo_id
,
FIRST_VALUE
(
GetParamValue
(
event_params
,
'ga_session_id'
).
int_value
)
OVER
(
UserWindow
)
AS
ga_session_id
,
FIRST_VALUE
(
GetParamValue
(
event_params
,
'ga_session_number'
).
int_value
)
OVER
(
UserWindow
)
AS
ga_session_number
FROM
-- Replace table name.
`
bigquery
-
public
-
data
.
ga4_obfuscated_sample_ecommerce
.
events_
*`
WHERE
user_pseudo_id
IN
UNNEST
(
USER_PSEUDO_ID_LIST
)
AND
RIGHT
(
_TABLE_SUFFIX
,
8
)
BETWEEN
GetDateSuffix
(
-
3
,
REPORTING_TIMEZONE
)
AND
GetDateSuffix
(
0
,
REPORTING_TIMEZONE
)
WINDOW
UserWindow
AS
(
PARTITION
BY
user_pseudo_id
ORDER
BY
event_timestamp
DESC
);
user_id
-- Get the latest ga_session_id and ga_session_number for specific users during last 4 days.
-- Replace timezone. List at https://en.wikipedia.org/wiki/List_of_tz_database_time_zones.
DECLARE
REPORTING_TIMEZONE
STRING
DEFAULT
'America/Los_Angeles'
;
-- Replace list of user_id's with ones you want to query.
DECLARE
USER_ID_LIST
ARRAY<STRING>
DEFAULT
[
'<user_id_1>'
,
'<user_id_2>'
,
'<user_id_n>'
];
CREATE
TEMP
FUNCTION
GetParamValue
(
params
ANY
TYPE
,
target_key
STRING
)
AS
(
(
SELECT
`
value
`
FROM
UNNEST
(
params
)
WHERE
key
=
target_key
LIMIT
1
)
);
CREATE
TEMP
FUNCTION
GetDateSuffix
(
date_shift
INT64
,
timezone
STRING
)
AS
(
(
SELECT
FORMAT_DATE
(
'%Y%m%d'
,
DATE_ADD
(
CURRENT_DATE
(
timezone
),
INTERVAL
date_shift
DAY
)))
);
SELECT
DISTINCT
user_pseudo_id
,
FIRST_VALUE
(
GetParamValue
(
event_params
,
'ga_session_id'
).
int_value
)
OVER
(
UserWindow
)
AS
ga_session_id
,
FIRST_VALUE
(
GetParamValue
(
event_params
,
'ga_session_number'
).
int_value
)
OVER
(
UserWindow
)
AS
ga_session_number
FROM
-- Replace table name.
`
bigquery
-
public
-
data
.
ga4_obfuscated_sample_ecommerce
.
events_
*`
WHERE
user_id
IN
UNNEST
(
USER_ID_LIST
)
AND
RIGHT
(
_TABLE_SUFFIX
,
8
)
BETWEEN
GetDateSuffix
(
-
3
,
REPORTING_TIMEZONE
)
AND
GetDateSuffix
(
0
,
REPORTING_TIMEZONE
)
WINDOW
UserWindow
AS
(
PARTITION
BY
user_pseudo_id
ORDER
BY
event_timestamp
DESC
);