Linear regression codelab

1. Intro

This codelab will teach you how to use linear regression to create a model that predicts cost-per-click.

Prerequisites

To complete this codelab, you'll need enough high quality campaign data to create a model.

2. Create a temporary table

Run the following query

  CREATE 
  
 TABLE 
  
 linear_regression_example_data 
 AS 
 ( 
  
 WITH 
  
 all_data 
  
 AS 
  
 ( 
  
 SELECT 
  
 imp 
 . 
 user_id 
  
 as 
  
 user_id 
 , 
  
 ROW_NUMBER 
 () 
  
 OVER 
 ( 
 PARTITION 
  
 BY 
  
 imp 
 . 
 user_id 
 ) 
  
 AS 
  
 rowIdx 
 , 
  
 imp 
 . 
 browser 
  
 AS 
  
 browser_name 
 , 
  
 gender_name 
  
 AS 
  
 gender_name 
 , 
  
 age_group_name 
  
 AS 
  
 age_group_name 
 , 
  
 DATETIME 
 ( 
 TIMESTAMP_MICROS 
 ( 
  
 imp 
 . 
 query_id 
 . 
 time_usec 
 ), 
  
 "America/Los_Angeles" 
 ) 
  
 as 
  
 impression_time 
 , 
  
 clk 
 . 
 advertiser_click_cost_usd 
  
 AS 
  
 label 
  
 FROM 
  
 adh 
 . 
 google_ads_impressions 
  
 imp 
  
 INNER 
  
 JOIN 
  
 adh 
 . 
 google_ads_clicks 
  
 clk 
  
 USING 
  
 ( 
 impression_id 
 ) 
  
 LEFT 
  
 JOIN 
  
 adh 
 . 
 gender 
  
 ON 
  
 demographics 
 . 
 gender 
  
 = 
  
 gender_id 
  
 LEFT 
  
 JOIN 
  
 adh 
 . 
 age_group 
  
 ON 
  
 demographics 
 . 
 age_group 
  
 = 
  
 age_group_id 
  
 ) 
  
 # Need just one user ID or regression won't work 
  
 SELECT 
  
 label 
 , 
  
 browser_name 
 , 
  
 gender_name 
 , 
  
 age_group_name 
 , 
  
 # Although BQML could divide impression_time into several useful variables on 
  
 # its own, it may attempt to divide it into too many features. As a best 
  
 # practice extract the variables that you think will be most helpful. 
  
 # The output of impression_time is a number, but we care about it as a 
  
 # category, so we cast it to a string. 
  
 CAST 
 ( 
 EXTRACT 
 ( 
 DAYOFWEEK 
  
 FROM 
  
 impression_time 
 ) 
  
 AS 
  
 STRING 
 ) 
  
 AS 
  
 day_of_week 
 , 
  
 CAST 
 ( 
 EXTRACT 
 ( 
 HOUR 
  
 FROM 
  
 impression_time 
 ) 
  
 AS 
  
 STRING 
 ) 
  
 AS 
  
 hour 
 , 
  
 FROM 
  
 all_data 
  
 WHERE 
  
 rowIdx 
  
 = 
  
 1 
  
 # This ensures that there's only 1 row per user. 
  
 AND 
  
 label 
  
 IS 
  
 NOT 
  
 NULL 
  
 AND 
  
 gender_name 
  
 IS 
  
 NOT 
  
 NULL 
  
 AND 
  
 age_group_name 
  
 IS 
  
 NOT 
  
 NULL 
 ); 
 

3. Create and train a model

It's a best practice to separate your table creation steps from your model creation steps.

Run the following query on the temporary table you created in the previous step. Don't worry about providing start and end dates, as these will be inferred based on data in the temporary table.

  CREATE 
  
 OR 
  
 REPLACE 
 MODEL 
  
 `example_linear` 
 OPTIONS 
 ( 
  
 model_type 
  
 = 
  
 'adh_linear_regression' 
 ) 
 AS 
  
 ( 
  
 SELECT 
  
 * 
  
 FROM 
  
 tmp 
 . 
 linear_regression_example_data 
 ); 
 SELECT 
  
 * 
  
 FROM 
  
 ML 
 . 
 EVALUATE 
 ( 
 MODEL 
  
 `example_linear` 
 ) 
 

Row

mean_absolute_error

mean_squared_error

mean_squared_log_error

median_absolute_error

r2_score

explained_variance

1

0.11102380666874107

0.019938972461569476

0.019503393448234131

0.091792024503562136

-9.8205955364568478

-9.7975398794423025

Create a Mobile Website
View Site in Mobile | Classic
Share by: