Simulated data is provided as an example for each data type and format in the following sections.
CSV
To load the simulated CSV data
using CsvDataLoader
:
-
Map the column names to the variable types. The required variable types are
time,geo,controls,population,kpi, andrevenue_per_kpi. For media channels that don't have reach and frequency data, you must assign their media exposure and media spend to the categories ofmediaandmedia_spend, respectively. Conversely, for media channels that do possess reach and frequency data, you must map their reach, frequency, and media spend to the categories ofreach,frequency, andrf_spendcorrespondingly. For the definition of each variable, see Collect and organize your data .coord_to_columns = load . CoordToColumns ( time = 'time' , geo = 'geo' , controls = [ 'GQV' , 'Discount' , 'Competitor_Sales' ], population = 'population' , kpi = 'conversions' , revenue_per_kpi = 'revenue_per_conversion' , media = [ 'Channel0_impression' , 'Channel1_impression' , 'Channel2_impression' , 'Channel3_impression' , ], media_spend = [ 'Channel0_spend' , 'Channel1_spend' , 'Channel2_spend' , 'Channel3_spend' , ], reach = [ 'Channel4_reach' , 'Channel5_reach' ], frequency = [ 'Channel4_frequency' , 'Channel5_frequency' ], rf_spend = [ 'Channel4_spend' , 'Channel5_spend' ], ) -
Map the media exposure, reach, frequency, and the media spends to the designated channel names that you want to display in the two-page output. In the following example,
Channel0_impressionandChannel0_spendare connected to the same channel,Channel0. Additionally,Channel4_reach,Channel4_frequency, andChannel4_spendare connected to the same channel,Channel4.correct_media_to_channel = { 'Channel0_impression' : 'Channel0' , 'Channel1_impression' : 'Channel1' , 'Channel2_impression' : 'Channel2' , 'Channel3_impression' : 'Channel3' , } correct_media_spend_to_channel = { 'Channel0_spend' : 'Channel0' , 'Channel1_spend' : 'Channel1' , 'Channel2_spend' : 'Channel2' , 'Channel3_spend' : 'Channel3' , } correct_reach_to_channel = { 'Channel4_reach' : 'Channel4' , 'Channel5_reach' : 'Channel5' , } correct_frequency_to_channel = { 'Channel4_frequency' : 'Channel4' , 'Channel5_frequency' : 'Channel5' , } correct_rf_spend_to_channel = { 'Channel4_spend' : 'Channel4' , 'Channel5_spend' : 'Channel5' , } -
Load the data using
CsvDataLoader:loader = load . CsvDataLoader ( csv_path = f '/ { PATH } / { FILENAME } .csv' , kpi_type = 'non_revenue' , coord_to_columns = coord_to_columns , media_to_channel = correct_media_to_channel , media_spend_to_channel = correct_media_spend_to_channel , reach_to_channel = correct_reach_to_channel , frequency_to_channel = correct_frequency_to_channel , rf_spend_to_channel = correct_rf_spend_to_channel , ) data = loader . load ()Where:
-
kpi_typeis either'revenue'or'non_revenue'. -
PATHis the path to the data file location. -
FILENAMEis the name of your data file.
-
Xarray Dataset
To load the pickled simulated Xarray Dataset
using XrDatasetDataLoader
:
-
Load the data using
pickle:import pickle with open ( f '/ { PATH } / { FILENAME } .pkl' , 'r' ) as fh : dataset = pickle . load ( fh )Where:
-
PATHis the path to the data file location. -
FILENAMEis the name of your data file.
-
-
Pass the dataset to
XrDatasetDataLoader. Use thename_mappingargument to map the coordinates and arrays. Provide mapping if the names in the input dataset are different from the required names. The required coordinate names aregeo,time,control_variable,media_channel, andrf_channel, whererf_channeldesignates the channels having reach and frequency data. The required data variables names arekpi,revenue_per_kpi,controls,population,media,media_spend,reach,frequency, andrf_spend.loader = load . XrDatasetDataLoader ( dataset , kpi_type = 'non_revenue' , name_mapping = { 'channel' : 'media_channel' , 'control' : 'control_variable' , 'conversions' : 'kpi' , 'revenue_per_conversion' : 'revenue_per_kpi' , 'control_value' : 'controls' , 'spend' : 'media_spend' , 'reach' : 'reach' , 'frequency' : 'frequency' , 'rf_spend' : 'rf_spend' , }, ) data = loader . load ()Where:
-
kpi_typeis either'revenue'or'non_revenue'.
-
Numpy ndarray
To load numpy ndarrays directly, use NDArrayInputDataBuilder
:
-
Create the data into separate numpy ndarrays.
import numpy as np kpi_nd = np . array ([[ 1 , 2 , 3 ], [ 4 , 5 , 6 ], [ 7 , 8 , 9 ]]) controls_nd = np . array ([ [[ 1 , 5 ], [ 2 , 6 ], [ 3 , 4 ]], [[ 7 , 8 ], [ 9 , 10 ], [ 11 , 12 ]], [[ 13 , 14 ], [ 15 , 16 ], [ 17 , 18 ]], ]) population_nd = np . array ([ 1 , 2 , 3 ]) revenue_per_kpi_nd = np . array ([[ 1 , 2 , 3 ], [ 4 , 5 , 6 ], [ 7 , 8 , 9 ]]) reach_nd = np . array ([ [[ 1 , 5 ], [ 2 , 6 ], [ 3 , 4 ]], [[ 7 , 8 ], [ 9 , 10 ], [ 11 , 12 ]], [[ 13 , 14 ], [ 15 , 16 ], [ 17 , 18 ]], ]) frequency_nd = np . array ([ [[ 1 , 5 ], [ 2 , 6 ], [ 3 , 4 ]], [[ 7 , 8 ], [ 9 , 10 ], [ 11 , 12 ]], [[ 13 , 14 ], [ 15 , 16 ], [ 17 , 18 ]], ]) rf_spend_nd = np . array ([ [[ 1 , 5 ], [ 2 , 6 ], [ 3 , 4 ]], [[ 7 , 8 ], [ 9 , 10 ], [ 11 , 12 ]], [[ 13 , 14 ], [ 15 , 16 ], [ 17 , 18 ]], ]) -
Use a
NDArrayInputDataBuilderto set time and geos, as well as give channel or dimension names as required in a Meridian input data. For the definition of each variable, see Collect and organize your data .from meridian.data import nd_array_input_data_builder as data_builder builder = ( data_builder . NDArrayInputDataBuilder ( kpi_type = 'non_revenue' ) ) builder . time_coords = [ '2024-01-02' , '2024-01-03' , '2024-01-01' ] builder . media_time_coords = [ '2024-01-02' , '2024-01-03' , '2024-01-01' ] builder . geos = [ 'B' , 'A' , 'C' ] builder = ( builder . with_kpi ( kpi_nd ) . with_revenue_per_kpi ( revenue_per_kpi_nd ) . with_population ( population_nd ) . with_controls ( controls_nd , control_names = [ "control0" , "control1" ]) . with_reach ( r_nd = reach_nd , f_nd = frequency_nd , rfs_nd = rf_spend_nd , rf_channels = [ "channel0" , "channel1" ] ) ) data = builder . build ()Where:
-
kpi_typeis either'revenue'or'non_revenue'.
-
Pandas DataFrame or other data formats
To load the simulated other data
format
(such as excel
) using DataFrameInputDataBuilder
:
-
Read the data (such as an
excelspreadsheet) into one or more PandasDataFrame(s).import pandas as pd df = pd . read_excel ( 'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_media_rf.xlsx' , engine = 'openpyxl' , ) -
Use a
DataFrameInputDataBuilderto map column names to the variable types required in a Meridian input data. For the definition of each variable, see Collect and organize your data .from meridian.data import data_frame_input_data_builder as data_builder builder = data_builder . DataFrameInputDataBuilder ( kpi_type = 'non_revenue' , default_kpi_column = "conversions" , default_revenue_per_kpi_column = "revenue_per_conversion" , ) builder = ( builder . with_kpi ( df ) . with_revenue_per_kpi ( df ) . with_population ( df ) . with_controls ( df , control_cols = [ "GQV" , "Discount" , "Competitor_Sales" ]) . with_reach ( df , reach_cols = [ 'Channel4_reach' , 'Channel5_reach' ], frequency_cols = [ 'Channel4_frequency' , 'Channel5_frequency' ], rf_spend_cols = [ 'Channel4_spend' , 'Channel5_spend' ], rf_channels = [ 'Channel4' , 'Channel5' ], ) ) data = builder . build ()Where:
-
kpi_typeis either'revenue'or'non_revenue'.
-
Next, you can create your model .


