Load geo-level data with organic media and non-media treatments

Simulated data is provided as an example for each data type and format in the following sections.

CSV

To load the simulated CSV data using CsvDataLoader :

  1. Map the column names to the variable types. The required variable types are time , geo , controls , population , kpi , revenue_per_kpi , media , and media_spend . For media channels that have no direct cost, you must assign their media exposure to organic_media . For non-media treatments, you must assign the corresponding columns names to non_media_treatments . For the definition of each variable, see Collect and organize your data .

      coord_to_columns 
     = 
     load 
     . 
     CoordToColumns 
     ( 
     time 
     = 
     'time' 
     , 
     geo 
     = 
     'geo' 
     , 
     controls 
     = 
     [ 
     'GQV' 
     , 
     'Competitor_Sales' 
     ], 
     population 
     = 
     'population' 
     , 
     kpi 
     = 
     'conversions' 
     , 
     revenue_per_kpi 
     = 
     'revenue_per_conversion' 
     , 
     media 
     = 
     [ 
     'Channel0_impression' 
     , 
     'Channel1_impression' 
     , 
     'Channel2_impression' 
     , 
     'Channel3_impression' 
     , 
     'Channel4_impression' 
     , 
     ], 
     media_spend 
     = 
     [ 
     'Channel0_spend' 
     , 
     'Channel1_spend' 
     , 
     'Channel2_spend' 
     , 
     'Channel3_spend' 
     , 
     'Channel4_spend' 
     , 
     ], 
     organic_media 
     = 
     [ 
     'Organic_channel0_impression' 
     ], 
     non_media_treatments 
     = 
     [ 
     'Promo' 
     ], 
     ) 
     
    
  2. Map the media variables and the media spends to the designated channel names that you want to display in the two-page output. In the following example, Channel0_impression and Channel0_spend are connected to the same channel, Channel0 .

      correct_media_to_channel 
     = 
     { 
     'Channel0_impression' 
     : 
     'Channel0' 
     , 
     'Channel1_impression' 
     : 
     'Channel1' 
     , 
     'Channel2_impression' 
     : 
     'Channel2' 
     , 
     'Channel3_impression' 
     : 
     'Channel3' 
     , 
     'Channel4_impression' 
     : 
     'Channel4' 
     , 
     } 
     correct_media_spend_to_channel 
     = 
     { 
     'Channel0_spend' 
     : 
     'Channel0' 
     , 
     'Channel1_spend' 
     : 
     'Channel1' 
     , 
     'Channel2_spend' 
     : 
     'Channel2' 
     , 
     'Channel3_spend' 
     : 
     'Channel3' 
     , 
     'Channel4_spend' 
     : 
     'Channel4' 
     , 
     } 
     
    
  3. Load the data using CsvDataLoader :

      loader 
     = 
     load 
     . 
     CsvDataLoader 
     ( 
     csv_path 
     = 
     f 
     '/ 
     { 
     PATH 
     } 
     / 
     { 
     FILENAME 
     } 
     .csv' 
     , 
     kpi_type 
     = 
     'non_revenue' 
     , 
     coord_to_columns 
     = 
     coord_to_columns 
     , 
     media_to_channel 
     = 
     correct_media_to_channel 
     , 
     media_spend_to_channel 
     = 
     correct_media_spend_to_channel 
     , 
     ) 
     data 
     = 
     loader 
     . 
     load 
     () 
     
    

    Where:

    • kpi_type is either 'revenue' or 'non_revenue' .
    • PATH is the path to the data file location.
    • FILENAME is the name of your data file.

Xarray Dataset

To load the simulated Xarray Dataset using XrDatasetDataLoader :

  1. Load the data using pickle :

      import 
      
     pickle 
     with 
     open 
     ( 
     f 
     '/ 
     { 
     PATH 
     } 
     / 
     { 
     FILENAME 
     } 
     .pkl' 
     , 
     'r' 
     ) 
     as 
     fh 
     : 
     XrDataset 
     = 
     pickle 
     . 
     load 
     ( 
     fh 
     ) 
     
    

    Where:

    • PATH is the path to the data file location.
    • FILENAME is the name of your data file.
  2. Pass the dataset to XrDatasetDataLoader . Use the name_mapping argument to map the coordinates and arrays. Provide mapping if the names in the input dataset are different from the required names. The required coordinate names are geo , time , control_variable , media_channel , organic_media_channel , and non_media_channel . The required data variables names are kpi , revenue_per_kpi , controls , population , media , media_spend , organic_media , and non_media_treatments .

      loader 
     = 
     load 
     . 
     XrDatasetDataLoader 
     ( 
     XrDataset 
     , 
     kpi_type 
     = 
     'non_revenue' 
     , 
     name_mapping 
     = 
     { 
     'channel' 
     : 
     'media_channel' 
     , 
     'control' 
     : 
     'control_variable' 
     , 
     'organic_channel' 
     : 
     'organic_media_channel' 
     , 
     'non_media_treatment' 
     : 
     'non_media_channel' 
     , 
     'conversions' 
     : 
     'kpi' 
     , 
     'revenue_per_conversion' 
     : 
     'revenue_per_kpi' 
     , 
     'control_value' 
     : 
     'controls' 
     , 
     'spend' 
     : 
     'media_spend' 
     , 
     'non_media_treatment_value' 
     : 
     'non_media_treatments' 
     }, 
     ) 
     data 
     = 
     loader 
     . 
     load 
     () 
     
    

    Where:

    • kpi_type is either 'revenue' or 'non_revenue' .

Numpy ndarray

To load numpy ndarrays directly, use NDArrayInputDataBuilder :

  1. Create the data into separate numpy ndarrays.

      import 
      
     numpy 
      
     as 
      
     np 
     kpi_nd 
     = 
     np 
     . 
     array 
     ([[ 
     1 
     , 
     2 
     , 
     3 
     ], 
     [ 
     4 
     , 
     5 
     , 
     6 
     ], 
     [ 
     7 
     , 
     8 
     , 
     9 
     ]]) 
     controls_nd 
     = 
     np 
     . 
     array 
     ([ 
     [[ 
     1 
     , 
     5 
     ], 
     [ 
     2 
     , 
     6 
     ], 
     [ 
     3 
     , 
     4 
     ]], 
     [[ 
     7 
     , 
     8 
     ], 
     [ 
     9 
     , 
     10 
     ], 
     [ 
     11 
     , 
     12 
     ]], 
     [[ 
     13 
     , 
     14 
     ], 
     [ 
     15 
     , 
     16 
     ], 
     [ 
     17 
     , 
     18 
     ]], 
     ]) 
     population_nd 
     = 
     np 
     . 
     array 
     ([ 
     1 
     , 
     2 
     , 
     3 
     ]) 
     revenue_per_kpi_nd 
     = 
     np 
     . 
     array 
     ([[ 
     1 
     , 
     2 
     , 
     3 
     ], 
     [ 
     4 
     , 
     5 
     , 
     6 
     ], 
     [ 
     7 
     , 
     8 
     , 
     9 
     ]]) 
     media_nd 
     = 
     np 
     . 
     array 
     ([ 
     [[ 
     1 
     , 
     5 
     ], 
     [ 
     2 
     , 
     6 
     ], 
     [ 
     3 
     , 
     4 
     ]], 
     [[ 
     7 
     , 
     8 
     ], 
     [ 
     9 
     , 
     10 
     ], 
     [ 
     11 
     , 
     12 
     ]], 
     [[ 
     13 
     , 
     14 
     ], 
     [ 
     15 
     , 
     16 
     ], 
     [ 
     17 
     , 
     18 
     ]], 
     ]) 
     media_spend_nd 
     = 
     np 
     . 
     array 
     ([ 
     [[ 
     1 
     , 
     5 
     ], 
     [ 
     2 
     , 
     6 
     ], 
     [ 
     3 
     , 
     4 
     ]], 
     [[ 
     7 
     , 
     8 
     ], 
     [ 
     9 
     , 
     10 
     ], 
     [ 
     11 
     , 
     12 
     ]], 
     [[ 
     13 
     , 
     14 
     ], 
     [ 
     15 
     , 
     16 
     ], 
     [ 
     17 
     , 
     18 
     ]], 
     ]) 
     organic_media_nd 
     = 
     np 
     . 
     array 
     ([ 
     [[ 
     1 
     , 
     5 
     ], 
     [ 
     2 
     , 
     6 
     ], 
     [ 
     3 
     , 
     4 
     ]], 
     [[ 
     7 
     , 
     8 
     ], 
     [ 
     9 
     , 
     10 
     ], 
     [ 
     11 
     , 
     12 
     ]], 
     [[ 
     13 
     , 
     14 
     ], 
     [ 
     15 
     , 
     16 
     ], 
     [ 
     17 
     , 
     18 
     ]], 
     ]) 
     non_media_treatments_nd 
     = 
     np 
     . 
     array 
     ([ 
     [[ 
     1 
     , 
     5 
     ], 
     [ 
     2 
     , 
     6 
     ], 
     [ 
     3 
     , 
     4 
     ]], 
     [[ 
     7 
     , 
     8 
     ], 
     [ 
     9 
     , 
     10 
     ], 
     [ 
     11 
     , 
     12 
     ]], 
     [[ 
     13 
     , 
     14 
     ], 
     [ 
     15 
     , 
     16 
     ], 
     [ 
     17 
     , 
     18 
     ]], 
     ]) 
     
    
  2. Use a NDArrayInputDataBuilder to set time and geos, as well as give channel or dimension names as required in a Meridian input data. For the definition of each variable, see Collect and organize your data .

      from 
      
     meridian.data 
      
     import 
     nd_array_input_data_builder 
     as 
     data_builder 
     builder 
     = 
     ( 
     data_builder 
     . 
     NDArrayInputDataBuilder 
     ( 
     kpi_type 
     = 
     'non_revenue' 
     ) 
     ) 
     builder 
     . 
     time_coords 
     = 
     [ 
     '2024-01-02' 
     , 
     '2024-01-03' 
     , 
     '2024-01-01' 
     ] 
     builder 
     . 
     media_time_coords 
     = 
     [ 
     '2024-01-02' 
     , 
     '2024-01-03' 
     , 
     '2024-01-01' 
     ] 
     builder 
     . 
     geos 
     = 
     [ 
     'B' 
     , 
     'A' 
     , 
     'C' 
     ] 
     builder 
     = 
     ( 
     builder 
     . 
     with_kpi 
     ( 
     kpi_nd 
     ) 
     . 
     with_revenue_per_kpi 
     ( 
     revenue_per_kpi_nd 
     ) 
     . 
     with_population 
     ( 
     population_nd 
     ) 
     . 
     with_controls 
     ( 
     controls_nd 
     , 
     control_names 
     = 
     [ 
     "control0" 
     , 
     "control1" 
     ]) 
     . 
     with_media 
     ( 
     m_nd 
     = 
     media_nd 
     , 
     ms_nd 
     = 
     media_spend_nd 
     , 
     media_channels 
     = 
     [ 
     "channel0" 
     , 
     "channel1" 
     ] 
     ) 
     . 
     with_organic_media 
     ( 
     organic_media_nd 
     , 
     organic_media_channels 
     = 
     [ 
     "organic_channel0" 
     , 
     "organic_channel1" 
     ] 
     ) 
     . 
     with_non_media_treatments 
     ( 
     non_media_treatments_nd 
     , 
     non_media_channel_names 
     = 
     [ 
     "non_media_channel0" 
     , 
     "non_media_channel1" 
     ] 
     ) 
     ) 
     data 
     = 
     builder 
     . 
     build 
     () 
     
    

    Where:

    • kpi_type is either 'revenue' or 'non_revenue' .

Pandas DataFrame or other data formats

To load the simulated other data format (such as excel ) using DataFrameInputDataBuilder :

  1. Read the data (such as an excel spreadsheet) into one or more Pandas DataFrame (s).

      import 
      
     pandas 
      
     as 
      
     pd 
     df 
     = 
     pd 
     . 
     read_excel 
     ( 
     'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_all_channels.xlsx' 
     , 
     engine 
     = 
     'openpyxl' 
     , 
     ) 
     
    
  2. Use a DataFrameInputDataBuilder to map column names to the variable types required in a Meridian input data. For the definition of each variable, see Collect and organize your data .

      from 
      
     meridian.data 
      
     import 
     data_frame_input_data_builder 
     as 
     data_builder 
     builder 
     = 
     data_builder 
     . 
     DataFrameInputDataBuilder 
     ( 
     kpi_type 
     = 
     'non_revenue' 
     , 
     default_kpi_column 
     = 
     "conversions" 
     , 
     default_revenue_per_kpi_column 
     = 
     "revenue_per_conversion" 
     , 
     ) 
     builder 
     = 
     ( 
     builder 
     . 
     with_kpi 
     ( 
     df 
     ) 
     . 
     with_revenue_per_kpi 
     ( 
     df 
     ) 
     . 
     with_population 
     ( 
     df 
     ) 
     . 
     with_controls 
     ( 
     df 
     , 
     control_cols 
     = 
     [ 
     "GQV" 
     , 
     "Competitor_Sales" 
     ]) 
     ) 
     channels 
     = 
     [ 
     "Channel0" 
     , 
     "Channel1" 
     , 
     "Channel2" 
     , 
     "Channel3" 
     , 
     "Channel4" 
     ] 
     builder 
     = 
     builder 
     . 
     with_media 
     ( 
     df 
     , 
     media_cols 
     = 
     [ 
     f 
     " 
     { 
     channel 
     } 
     _impression" 
     for 
     channel 
     in 
     channels 
     ], 
     media_spend_cols 
     = 
     [ 
     f 
     " 
     { 
     channel 
     } 
     _spend" 
     for 
     channel 
     in 
     channels 
     ], 
     media_channels 
     = 
     channels 
     , 
     ) 
     builder 
     = 
     ( 
     builder 
     . 
     with_organic_media 
     ( 
     df 
     , 
     organic_media_cols 
     = 
     [ 
     "Organic_channel0_impression" 
     ], 
     organic_media_channels 
     = 
     [ 
     "Organic_channel0" 
     ], 
     ) 
     . 
     with_non_media_treatments 
     ( 
     df 
     , 
     non_media_treatment_cols 
     = 
     [ 
     'Promo' 
     ] 
     ) 
     ) 
     data 
     = 
     builder 
     . 
     build 
     () 
     
    

    Where:

    • kpi_type is either 'revenue' or 'non_revenue' .

Next, you can create your model .

Design a Mobile Site
View Site in Mobile | Classic
Share by: