1. Intro
This codelab will teach you how to use linear regression to create a model that predicts cost-per-click.
Prerequisites
To complete this codelab, you'll need enough high quality campaign data to create a model.
2. Create a temporary table
Run the following query
CREATE
TABLE
linear_regression_example_data
AS
(
WITH
all_data
AS
(
SELECT
imp
.
user_id
as
user_id
,
ROW_NUMBER
()
OVER
(
PARTITION
BY
imp
.
user_id
)
AS
rowIdx
,
imp
.
browser
AS
browser_name
,
gender_name
AS
gender_name
,
age_group_name
AS
age_group_name
,
DATETIME
(
TIMESTAMP_MICROS
(
imp
.
query_id
.
time_usec
),
"America/Los_Angeles"
)
as
impression_time
,
clk
.
advertiser_click_cost_usd
AS
label
FROM
adh
.
google_ads_impressions
imp
INNER
JOIN
adh
.
google_ads_clicks
clk
USING
(
impression_id
)
LEFT
JOIN
adh
.
gender
ON
demographics
.
gender
=
gender_id
LEFT
JOIN
adh
.
age_group
ON
demographics
.
age_group
=
age_group_id
)
# Need just one user ID or regression won't work
SELECT
label
,
browser_name
,
gender_name
,
age_group_name
,
# Although BQML could divide impression_time into several useful variables on
# its own, it may attempt to divide it into too many features. As a best
# practice extract the variables that you think will be most helpful.
# The output of impression_time is a number, but we care about it as a
# category, so we cast it to a string.
CAST
(
EXTRACT
(
DAYOFWEEK
FROM
impression_time
)
AS
STRING
)
AS
day_of_week
,
CAST
(
EXTRACT
(
HOUR
FROM
impression_time
)
AS
STRING
)
AS
hour
,
FROM
all_data
WHERE
rowIdx
=
1
# This ensures that there's only 1 row per user.
AND
label
IS
NOT
NULL
AND
gender_name
IS
NOT
NULL
AND
age_group_name
IS
NOT
NULL
);
3. Create and train a model
It's a best practice to separate your table creation steps from your model creation steps.
Run the following query on the temporary table you created in the previous step. Don't worry about providing start and end dates, as these will be inferred based on data in the temporary table.
CREATE
OR
REPLACE
MODEL
`example_linear`
OPTIONS
(
model_type
=
'adh_linear_regression'
)
AS
(
SELECT
*
FROM
tmp
.
linear_regression_example_data
);
SELECT
*
FROM
ML
.
EVALUATE
(
MODEL
`example_linear`
)
Row |
mean_absolute_error |
mean_squared_error |
mean_squared_log_error |
median_absolute_error |
r2_score |
explained_variance |
---|---|---|---|---|---|---|
1 |
0.11102380666874107 |
0.019938972461569476 |
0.019503393448234131 |
0.091792024503562136 |
-9.8205955364568478 |
-9.7975398794423025 |