Load geo-level data with reach and frequency

Simulated data is provided as an example for each data type and format in the following sections.

CSV

To load the simulated CSV data using CsvDataLoader :

  1. Map the column names to the variable types. The required variable types are time , geo , controls , population , kpi , and revenue_per_kpi . For media channels that don't have reach and frequency data, you must assign their media exposure and media spend to the categories of media and media_spend , respectively. Conversely, for media channels that do possess reach and frequency data, you must map their reach, frequency, and media spend to the categories of reach , frequency , and rf_spend correspondingly. For the definition of each variable, see Collect and organize your data .

      coord_to_columns 
     = 
     load 
     . 
     CoordToColumns 
     ( 
     time 
     = 
     'time' 
     , 
     geo 
     = 
     'geo' 
     , 
     controls 
     = 
     [ 
     'GQV' 
     , 
     'Discount' 
     , 
     'Competitor_Sales' 
     ], 
     population 
     = 
     'population' 
     , 
     kpi 
     = 
     'conversions' 
     , 
     revenue_per_kpi 
     = 
     'revenue_per_conversion' 
     , 
     media 
     = 
     [ 
     'Channel0_impression' 
     , 
     'Channel1_impression' 
     , 
     'Channel2_impression' 
     , 
     'Channel3_impression' 
     , 
     ], 
     media_spend 
     = 
     [ 
     'Channel0_spend' 
     , 
     'Channel1_spend' 
     , 
     'Channel2_spend' 
     , 
     'Channel3_spend' 
     , 
     ], 
     reach 
     = 
     [ 
     'Channel4_reach' 
     , 
     'Channel5_reach' 
     ], 
     frequency 
     = 
     [ 
     'Channel4_frequency' 
     , 
     'Channel5_frequency' 
     ], 
     rf_spend 
     = 
     [ 
     'Channel4_spend' 
     , 
     'Channel5_spend' 
     ], 
     ) 
     
    
  2. Map the media exposure, reach, frequency, and the media spends to the designated channel names that you want to display in the two-page output. In the following example, Channel0_impression and Channel0_spend are connected to the same channel, Channel0 . Additionally, Channel4_reach , Channel4_frequency , and Channel4_spend are connected to the same channel, Channel4 .

      correct_media_to_channel 
     = 
     { 
     'Channel0_impression' 
     : 
     'Channel0' 
     , 
     'Channel1_impression' 
     : 
     'Channel1' 
     , 
     'Channel2_impression' 
     : 
     'Channel2' 
     , 
     'Channel3_impression' 
     : 
     'Channel3' 
     , 
     } 
     correct_media_spend_to_channel 
     = 
     { 
     'Channel0_spend' 
     : 
     'Channel0' 
     , 
     'Channel1_spend' 
     : 
     'Channel1' 
     , 
     'Channel2_spend' 
     : 
     'Channel2' 
     , 
     'Channel3_spend' 
     : 
     'Channel3' 
     , 
     } 
     correct_reach_to_channel 
     = 
     { 
     'Channel4_reach' 
     : 
     'Channel4' 
     , 
     'Channel5_reach' 
     : 
     'Channel5' 
     , 
     } 
     correct_frequency_to_channel 
     = 
     { 
     'Channel4_frequency' 
     : 
     'Channel4' 
     , 
     'Channel5_frequency' 
     : 
     'Channel5' 
     , 
     } 
     correct_rf_spend_to_channel 
     = 
     { 
     'Channel4_spend' 
     : 
     'Channel4' 
     , 
     'Channel5_spend' 
     : 
     'Channel5' 
     , 
     } 
     
    
  3. Load the data using CsvDataLoader :

      loader 
     = 
     load 
     . 
     CsvDataLoader 
     ( 
     csv_path 
     = 
     f 
     '/ 
     { 
     PATH 
     } 
     / 
     { 
     FILENAME 
     } 
     .csv' 
     , 
     kpi_type 
     = 
     'non_revenue' 
     , 
     coord_to_columns 
     = 
     coord_to_columns 
     , 
     media_to_channel 
     = 
     correct_media_to_channel 
     , 
     media_spend_to_channel 
     = 
     correct_media_spend_to_channel 
     , 
     reach_to_channel 
     = 
     correct_reach_to_channel 
     , 
     frequency_to_channel 
     = 
     correct_frequency_to_channel 
     , 
     rf_spend_to_channel 
     = 
     correct_rf_spend_to_channel 
     , 
     ) 
     data 
     = 
     loader 
     . 
     load 
     () 
     
    

    Where:

    • kpi_type is either 'revenue' or 'non_revenue' .
    • PATH is the path to the data file location.
    • FILENAME is the name of your data file.

Xarray Dataset

To load the pickled simulated Xarray Dataset using XrDatasetDataLoader :

  1. Load the data using pickle :

      import 
      
     pickle 
     with 
     open 
     ( 
     f 
     '/ 
     { 
     PATH 
     } 
     / 
     { 
     FILENAME 
     } 
     .pkl' 
     , 
     'r' 
     ) 
     as 
     fh 
     : 
     dataset 
     = 
     pickle 
     . 
     load 
     ( 
     fh 
     ) 
     
    

    Where:

    • PATH is the path to the data file location.
    • FILENAME is the name of your data file.
  2. Pass the dataset to XrDatasetDataLoader . Use the name_mapping argument to map the coordinates and arrays. Provide mapping if the names in the input dataset are different from the required names. The required coordinate names are geo , time , control_variable , media_channel , and rf_channel , where rf_channel designates the channels having reach and frequency data. The required data variables names are kpi , revenue_per_kpi , controls , population , media , media_spend , reach , frequency , and rf_spend .

      loader 
     = 
     load 
     . 
     XrDatasetDataLoader 
     ( 
     dataset 
     , 
     kpi_type 
     = 
     'non_revenue' 
     , 
     name_mapping 
     = 
     { 
     'channel' 
     : 
     'media_channel' 
     , 
     'control' 
     : 
     'control_variable' 
     , 
     'conversions' 
     : 
     'kpi' 
     , 
     'revenue_per_conversion' 
     : 
     'revenue_per_kpi' 
     , 
     'control_value' 
     : 
     'controls' 
     , 
     'spend' 
     : 
     'media_spend' 
     , 
     'reach' 
     : 
     'reach' 
     , 
     'frequency' 
     : 
     'frequency' 
     , 
     'rf_spend' 
     : 
     'rf_spend' 
     , 
     }, 
     ) 
     data 
     = 
     loader 
     . 
     load 
     () 
     
    

    Where:

    • kpi_type is either 'revenue' or 'non_revenue' .

Numpy ndarray

To load numpy ndarrays directly, use NDArrayInputDataBuilder :

  1. Create the data into separate numpy ndarrays.

      import 
      
     numpy 
      
     as 
      
     np 
     kpi_nd 
     = 
     np 
     . 
     array 
     ([[ 
     1 
     , 
     2 
     , 
     3 
     ], 
     [ 
     4 
     , 
     5 
     , 
     6 
     ], 
     [ 
     7 
     , 
     8 
     , 
     9 
     ]]) 
     controls_nd 
     = 
     np 
     . 
     array 
     ([ 
     [[ 
     1 
     , 
     5 
     ], 
     [ 
     2 
     , 
     6 
     ], 
     [ 
     3 
     , 
     4 
     ]], 
     [[ 
     7 
     , 
     8 
     ], 
     [ 
     9 
     , 
     10 
     ], 
     [ 
     11 
     , 
     12 
     ]], 
     [[ 
     13 
     , 
     14 
     ], 
     [ 
     15 
     , 
     16 
     ], 
     [ 
     17 
     , 
     18 
     ]], 
     ]) 
     population_nd 
     = 
     np 
     . 
     array 
     ([ 
     1 
     , 
     2 
     , 
     3 
     ]) 
     revenue_per_kpi_nd 
     = 
     np 
     . 
     array 
     ([[ 
     1 
     , 
     2 
     , 
     3 
     ], 
     [ 
     4 
     , 
     5 
     , 
     6 
     ], 
     [ 
     7 
     , 
     8 
     , 
     9 
     ]]) 
     reach_nd 
     = 
     np 
     . 
     array 
     ([ 
     [[ 
     1 
     , 
     5 
     ], 
     [ 
     2 
     , 
     6 
     ], 
     [ 
     3 
     , 
     4 
     ]], 
     [[ 
     7 
     , 
     8 
     ], 
     [ 
     9 
     , 
     10 
     ], 
     [ 
     11 
     , 
     12 
     ]], 
     [[ 
     13 
     , 
     14 
     ], 
     [ 
     15 
     , 
     16 
     ], 
     [ 
     17 
     , 
     18 
     ]], 
     ]) 
     frequency_nd 
     = 
     np 
     . 
     array 
     ([ 
     [[ 
     1 
     , 
     5 
     ], 
     [ 
     2 
     , 
     6 
     ], 
     [ 
     3 
     , 
     4 
     ]], 
     [[ 
     7 
     , 
     8 
     ], 
     [ 
     9 
     , 
     10 
     ], 
     [ 
     11 
     , 
     12 
     ]], 
     [[ 
     13 
     , 
     14 
     ], 
     [ 
     15 
     , 
     16 
     ], 
     [ 
     17 
     , 
     18 
     ]], 
     ]) 
     rf_spend_nd 
     = 
     np 
     . 
     array 
     ([ 
     [[ 
     1 
     , 
     5 
     ], 
     [ 
     2 
     , 
     6 
     ], 
     [ 
     3 
     , 
     4 
     ]], 
     [[ 
     7 
     , 
     8 
     ], 
     [ 
     9 
     , 
     10 
     ], 
     [ 
     11 
     , 
     12 
     ]], 
     [[ 
     13 
     , 
     14 
     ], 
     [ 
     15 
     , 
     16 
     ], 
     [ 
     17 
     , 
     18 
     ]], 
     ]) 
     
    
  2. Use a NDArrayInputDataBuilder to set time and geos, as well as give channel or dimension names as required in a Meridian input data. For the definition of each variable, see Collect and organize your data .

      from 
      
     meridian.data 
      
     import 
     nd_array_input_data_builder 
     as 
     data_builder 
     builder 
     = 
     ( 
     data_builder 
     . 
     NDArrayInputDataBuilder 
     ( 
     kpi_type 
     = 
     'non_revenue' 
     ) 
     ) 
     builder 
     . 
     time_coords 
     = 
     [ 
     '2024-01-02' 
     , 
     '2024-01-03' 
     , 
     '2024-01-01' 
     ] 
     builder 
     . 
     media_time_coords 
     = 
     [ 
     '2024-01-02' 
     , 
     '2024-01-03' 
     , 
     '2024-01-01' 
     ] 
     builder 
     . 
     geos 
     = 
     [ 
     'B' 
     , 
     'A' 
     , 
     'C' 
     ] 
     builder 
     = 
     ( 
     builder 
     . 
     with_kpi 
     ( 
     kpi_nd 
     ) 
     . 
     with_revenue_per_kpi 
     ( 
     revenue_per_kpi_nd 
     ) 
     . 
     with_population 
     ( 
     population_nd 
     ) 
     . 
     with_controls 
     ( 
     controls_nd 
     , 
     control_names 
     = 
     [ 
     "control0" 
     , 
     "control1" 
     ]) 
     . 
     with_reach 
     ( 
     r_nd 
     = 
     reach_nd 
     , 
     f_nd 
     = 
     frequency_nd 
     , 
     rfs_nd 
     = 
     rf_spend_nd 
     , 
     rf_channels 
     = 
     [ 
     "channel0" 
     , 
     "channel1" 
     ] 
     ) 
     ) 
     data 
     = 
     builder 
     . 
     build 
     () 
     
    

    Where:

    • kpi_type is either 'revenue' or 'non_revenue' .

Pandas DataFrame or other data formats

To load the simulated other data format (such as excel ) using DataFrameInputDataBuilder :

  1. Read the data (such as an excel spreadsheet) into one or more Pandas DataFrame (s).

      import 
      
     pandas 
      
     as 
      
     pd 
     df 
     = 
     pd 
     . 
     read_excel 
     ( 
     'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx' 
     , 
     engine 
     = 
     'openpyxl' 
     , 
     ) 
     
    
  2. Use a DataFrameInputDataBuilder to map column names to the variable types required in a Meridian input data. For the definition of each variable, see Collect and organize your data .

      from 
      
     meridian.data 
      
     import 
     data_frame_input_data_builder 
     as 
     data_builder 
     builder 
     = 
     data_builder 
     . 
     DataFrameInputDataBuilder 
     ( 
     kpi_type 
     = 
     'non_revenue' 
     , 
     default_kpi_column 
     = 
     "conversions" 
     , 
     default_revenue_per_kpi_column 
     = 
     "revenue_per_conversion" 
     , 
     ) 
     builder 
     = 
     ( 
     builder 
     . 
     with_kpi 
     ( 
     df 
     ) 
     . 
     with_revenue_per_kpi 
     ( 
     df 
     ) 
     . 
     with_population 
     ( 
     df 
     ) 
     . 
     with_controls 
     ( 
     df 
     , 
     control_cols 
     = 
     [ 
     "GQV" 
     , 
     "Discount" 
     , 
     "Competitor_Sales" 
     ]) 
     . 
     with_reach 
     ( 
     df 
     , 
     reach_cols 
     = 
     [ 
     'Channel4_reach' 
     , 
     'Channel5_reach' 
     ], 
     frequency_cols 
     = 
     [ 
     'Channel4_frequency' 
     , 
     'Channel5_frequency' 
     ], 
     rf_spend_cols 
     = 
     [ 
     'Channel4_spend' 
     , 
     'Channel5_spend' 
     ], 
     rf_channels 
     = 
     [ 
     'Channel4' 
     , 
     'Channel5' 
     ], 
     ) 
     ) 
     data 
     = 
     builder 
     . 
     build 
     () 
     
    

    Where:

    • kpi_type is either 'revenue' or 'non_revenue' .

Next, you can create your model .

Design a Mobile Site
View Site in Mobile | Classic
Share by: