Join the newly launched Discord community for real-time discussions, peer support, and direct interaction with the Meridian team!

Load geo-level data with organic media and non-media treatments

Simulated data is provided as an example for each data type and format in the following sections.

CSV

To load the simulated CSV data using CsvDataLoader :

Map the column names to the variable types. The required variable types are time , geo , controls , population , kpi , revenue_per_kpi , media , and media_spend . For media channels that have no direct cost, you must assign their media exposure to organic_media . For non-media treatments, you must assign the corresponding columns names to non_media_treatments . For the definition of each variable, see Collect and organize your data .

  coord_to_columns 
 = 
 load 
 . 
 CoordToColumns 
 ( 
 time 
 = 
 'time' 
 , 
 geo 
 = 
 'geo' 
 , 
 controls 
 = 
 [ 
 'GQV' 
 , 
 'Competitor_Sales' 
 ], 
 population 
 = 
 'population' 
 , 
 kpi 
 = 
 'conversions' 
 , 
 revenue_per_kpi 
 = 
 'revenue_per_conversion' 
 , 
 media 
 = 
 [ 
 'Channel0_impression' 
 , 
 'Channel1_impression' 
 , 
 'Channel2_impression' 
 , 
 'Channel3_impression' 
 , 
 'Channel4_impression' 
 , 
 ], 
 media_spend 
 = 
 [ 
 'Channel0_spend' 
 , 
 'Channel1_spend' 
 , 
 'Channel2_spend' 
 , 
 'Channel3_spend' 
 , 
 'Channel4_spend' 
 , 
 ], 
 organic_media 
 = 
 [ 
 'Organic_channel0_impression' 
 ], 
 non_media_treatments 
 = 
 [ 
 'Promo' 
 ], 
 )

Map the media variables and the media spends to the designated channel names that you want to display in the two-page output. In the following example, Channel0_impression and Channel0_spend are connected to the same channel, Channel0 .

  correct_media_to_channel 
 = 
 { 
 'Channel0_impression' 
 : 
 'Channel0' 
 , 
 'Channel1_impression' 
 : 
 'Channel1' 
 , 
 'Channel2_impression' 
 : 
 'Channel2' 
 , 
 'Channel3_impression' 
 : 
 'Channel3' 
 , 
 'Channel4_impression' 
 : 
 'Channel4' 
 , 
 } 
 correct_media_spend_to_channel 
 = 
 { 
 'Channel0_spend' 
 : 
 'Channel0' 
 , 
 'Channel1_spend' 
 : 
 'Channel1' 
 , 
 'Channel2_spend' 
 : 
 'Channel2' 
 , 
 'Channel3_spend' 
 : 
 'Channel3' 
 , 
 'Channel4_spend' 
 : 
 'Channel4' 
 , 
 }

Load the data using CsvDataLoader :

  loader 
 = 
 load 
 . 
 CsvDataLoader 
 ( 
 csv_path 
 = 
 f 
 '/ 
 { 
 PATH 
 } 
 / 
 { 
 FILENAME 
 } 
 .csv' 
 , 
 kpi_type 
 = 
 'non_revenue' 
 , 
 coord_to_columns 
 = 
 coord_to_columns 
 , 
 media_to_channel 
 = 
 correct_media_to_channel 
 , 
 media_spend_to_channel 
 = 
 correct_media_spend_to_channel 
 , 
 ) 
 data 
 = 
 loader 
 . 
 load 
 ()

Where:

kpi_type is either 'revenue' or 'non_revenue' .
PATH is the path to the data file location.
FILENAME is the name of your data file.

Xarray Dataset

To load the simulated Xarray Dataset using XrDatasetDataLoader :

Load the data using pickle :

  import 
  
 pickle 
 with 
 open 
 ( 
 f 
 '/ 
 { 
 PATH 
 } 
 / 
 { 
 FILENAME 
 } 
 .pkl' 
 , 
 'r' 
 ) 
 as 
 fh 
 : 
 XrDataset 
 = 
 pickle 
 . 
 load 
 ( 
 fh 
 )

Where:

PATH is the path to the data file location.
FILENAME is the name of your data file.

Pass the dataset to XrDatasetDataLoader . Use the name_mapping argument to map the coordinates and arrays. Provide mapping if the names in the input dataset are different from the required names. The required coordinate names are geo , time , control_variable , media_channel , organic_media_channel , and non_media_channel . The required data variables names are kpi , revenue_per_kpi , controls , population , media , media_spend , organic_media , and non_media_treatments .

  loader 
 = 
 load 
 . 
 XrDatasetDataLoader 
 ( 
 XrDataset 
 , 
 kpi_type 
 = 
 'non_revenue' 
 , 
 name_mapping 
 = 
 { 
 'channel' 
 : 
 'media_channel' 
 , 
 'control' 
 : 
 'control_variable' 
 , 
 'organic_channel' 
 : 
 'organic_media_channel' 
 , 
 'non_media_treatment' 
 : 
 'non_media_channel' 
 , 
 'conversions' 
 : 
 'kpi' 
 , 
 'revenue_per_conversion' 
 : 
 'revenue_per_kpi' 
 , 
 'control_value' 
 : 
 'controls' 
 , 
 'spend' 
 : 
 'media_spend' 
 , 
 'non_media_treatment_value' 
 : 
 'non_media_treatments' 
 }, 
 ) 
 data 
 = 
 loader 
 . 
 load 
 ()

Where:

kpi_type is either 'revenue' or 'non_revenue' .

Numpy ndarray

To load numpy ndarrays directly, use NDArrayInputDataBuilder :

Create the data into separate numpy ndarrays.

  import 
  
 numpy 
  
 as 
  
 np 
 kpi_nd 
 = 
 np 
 . 
 array 
 ([[ 
 1 
 , 
 2 
 , 
 3 
 ], 
 [ 
 4 
 , 
 5 
 , 
 6 
 ], 
 [ 
 7 
 , 
 8 
 , 
 9 
 ]]) 
 controls_nd 
 = 
 np 
 . 
 array 
 ([ 
 [[ 
 1 
 , 
 5 
 ], 
 [ 
 2 
 , 
 6 
 ], 
 [ 
 3 
 , 
 4 
 ]], 
 [[ 
 7 
 , 
 8 
 ], 
 [ 
 9 
 , 
 10 
 ], 
 [ 
 11 
 , 
 12 
 ]], 
 [[ 
 13 
 , 
 14 
 ], 
 [ 
 15 
 , 
 16 
 ], 
 [ 
 17 
 , 
 18 
 ]], 
 ]) 
 population_nd 
 = 
 np 
 . 
 array 
 ([ 
 1 
 , 
 2 
 , 
 3 
 ]) 
 revenue_per_kpi_nd 
 = 
 np 
 . 
 array 
 ([[ 
 1 
 , 
 2 
 , 
 3 
 ], 
 [ 
 4 
 , 
 5 
 , 
 6 
 ], 
 [ 
 7 
 , 
 8 
 , 
 9 
 ]]) 
 media_nd 
 = 
 np 
 . 
 array 
 ([ 
 [[ 
 1 
 , 
 5 
 ], 
 [ 
 2 
 , 
 6 
 ], 
 [ 
 3 
 , 
 4 
 ]], 
 [[ 
 7 
 , 
 8 
 ], 
 [ 
 9 
 , 
 10 
 ], 
 [ 
 11 
 , 
 12 
 ]], 
 [[ 
 13 
 , 
 14 
 ], 
 [ 
 15 
 , 
 16 
 ], 
 [ 
 17 
 , 
 18 
 ]], 
 ]) 
 media_spend_nd 
 = 
 np 
 . 
 array 
 ([ 
 [[ 
 1 
 , 
 5 
 ], 
 [ 
 2 
 , 
 6 
 ], 
 [ 
 3 
 , 
 4 
 ]], 
 [[ 
 7 
 , 
 8 
 ], 
 [ 
 9 
 , 
 10 
 ], 
 [ 
 11 
 , 
 12 
 ]], 
 [[ 
 13 
 , 
 14 
 ], 
 [ 
 15 
 , 
 16 
 ], 
 [ 
 17 
 , 
 18 
 ]], 
 ]) 
 organic_media_nd 
 = 
 np 
 . 
 array 
 ([ 
 [[ 
 1 
 , 
 5 
 ], 
 [ 
 2 
 , 
 6 
 ], 
 [ 
 3 
 , 
 4 
 ]], 
 [[ 
 7 
 , 
 8 
 ], 
 [ 
 9 
 , 
 10 
 ], 
 [ 
 11 
 , 
 12 
 ]], 
 [[ 
 13 
 , 
 14 
 ], 
 [ 
 15 
 , 
 16 
 ], 
 [ 
 17 
 , 
 18 
 ]], 
 ]) 
 non_media_treatments_nd 
 = 
 np 
 . 
 array 
 ([ 
 [[ 
 1 
 , 
 5 
 ], 
 [ 
 2 
 , 
 6 
 ], 
 [ 
 3 
 , 
 4 
 ]], 
 [[ 
 7 
 , 
 8 
 ], 
 [ 
 9 
 , 
 10 
 ], 
 [ 
 11 
 , 
 12 
 ]], 
 [[ 
 13 
 , 
 14 
 ], 
 [ 
 15 
 , 
 16 
 ], 
 [ 
 17 
 , 
 18 
 ]], 
 ])

Use a NDArrayInputDataBuilder to set time and geos, as well as give channel or dimension names as required in a Meridian input data. For the definition of each variable, see Collect and organize your data .

  from 
  
 meridian.data 
  
 import 
 nd_array_input_data_builder 
 as 
 data_builder 
 builder 
 = 
 ( 
 data_builder 
 . 
 NDArrayInputDataBuilder 
 ( 
 kpi_type 
 = 
 'non_revenue' 
 ) 
 ) 
 builder 
 . 
 time_coords 
 = 
 [ 
 '2024-01-02' 
 , 
 '2024-01-03' 
 , 
 '2024-01-01' 
 ] 
 builder 
 . 
 media_time_coords 
 = 
 [ 
 '2024-01-02' 
 , 
 '2024-01-03' 
 , 
 '2024-01-01' 
 ] 
 builder 
 . 
 geos 
 = 
 [ 
 'B' 
 , 
 'A' 
 , 
 'C' 
 ] 
 builder 
 = 
 ( 
 builder 
 . 
 with_kpi 
 ( 
 kpi_nd 
 ) 
 . 
 with_revenue_per_kpi 
 ( 
 revenue_per_kpi_nd 
 ) 
 . 
 with_population 
 ( 
 population_nd 
 ) 
 . 
 with_controls 
 ( 
 controls_nd 
 , 
 control_names 
 = 
 [ 
 "control0" 
 , 
 "control1" 
 ]) 
 . 
 with_media 
 ( 
 m_nd 
 = 
 media_nd 
 , 
 ms_nd 
 = 
 media_spend_nd 
 , 
 media_channels 
 = 
 [ 
 "channel0" 
 , 
 "channel1" 
 ] 
 ) 
 . 
 with_organic_media 
 ( 
 organic_media_nd 
 , 
 organic_media_channels 
 = 
 [ 
 "organic_channel0" 
 , 
 "organic_channel1" 
 ] 
 ) 
 . 
 with_non_media_treatments 
 ( 
 non_media_treatments_nd 
 , 
 non_media_channel_names 
 = 
 [ 
 "non_media_channel0" 
 , 
 "non_media_channel1" 
 ] 
 ) 
 ) 
 data 
 = 
 builder 
 . 
 build 
 ()

Where:

kpi_type is either 'revenue' or 'non_revenue' .

Pandas DataFrame or other data formats

To load the simulated other data format (such as excel ) using DataFrameInputDataBuilder :

Read the data (such as an excel spreadsheet) into one or more Pandas DataFrame (s).

  import 
  
 pandas 
  
 as 
  
 pd 
 df 
 = 
 pd 
 . 
 read_excel 
 ( 
 'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_all_channels.xlsx' 
 , 
 engine 
 = 
 'openpyxl' 
 , 
 )

Use a DataFrameInputDataBuilder to map column names to the variable types required in a Meridian input data. For the definition of each variable, see Collect and organize your data .

  from 
  
 meridian.data 
  
 import 
 data_frame_input_data_builder 
 as 
 data_builder 
 builder 
 = 
 data_builder 
 . 
 DataFrameInputDataBuilder 
 ( 
 kpi_type 
 = 
 'non_revenue' 
 , 
 default_kpi_column 
 = 
 "conversions" 
 , 
 default_revenue_per_kpi_column 
 = 
 "revenue_per_conversion" 
 , 
 ) 
 builder 
 = 
 ( 
 builder 
 . 
 with_kpi 
 ( 
 df 
 ) 
 . 
 with_revenue_per_kpi 
 ( 
 df 
 ) 
 . 
 with_population 
 ( 
 df 
 ) 
 . 
 with_controls 
 ( 
 df 
 , 
 control_cols 
 = 
 [ 
 "GQV" 
 , 
 "Competitor_Sales" 
 ]) 
 ) 
 channels 
 = 
 [ 
 "Channel0" 
 , 
 "Channel1" 
 , 
 "Channel2" 
 , 
 "Channel3" 
 , 
 "Channel4" 
 ] 
 builder 
 = 
 builder 
 . 
 with_media 
 ( 
 df 
 , 
 media_cols 
 = 
 [ 
 f 
 " 
 { 
 channel 
 } 
 _impression" 
 for 
 channel 
 in 
 channels 
 ], 
 media_spend_cols 
 = 
 [ 
 f 
 " 
 { 
 channel 
 } 
 _spend" 
 for 
 channel 
 in 
 channels 
 ], 
 media_channels 
 = 
 channels 
 , 
 ) 
 builder 
 = 
 ( 
 builder 
 . 
 with_organic_media 
 ( 
 df 
 , 
 organic_media_cols 
 = 
 [ 
 "Organic_channel0_impression" 
 ], 
 organic_media_channels 
 = 
 [ 
 "Organic_channel0" 
 ], 
 ) 
 . 
 with_non_media_treatments 
 ( 
 df 
 , 
 non_media_treatment_cols 
 = 
 [ 
 'Promo' 
 ] 
 ) 
 ) 
 data 
 = 
 builder 
 . 
 build 
 ()

Where:

kpi_type is either 'revenue' or 'non_revenue' .

Next, you can create your model .

Load geo-level data with reach and frequency

Load national-level data

Load geo-level data with organic media and non-media treatments Stay organized with collections Save and categorize content based on your preferences.

CSV

Xarray Dataset

Numpy ndarray

Pandas DataFrame or other data formats

Load geo-level data with organic media and non-media treatments