Simulated data is provided as an example for each data type and format in the following sections.
CSV
To load the simulated
CSV
data using CsvDataLoader
:
-
Map the column names to the variable types. The required variable types are
time,geo,controls,population,kpi,revenue_per_kpi,media, andmedia_spend. For media channels that have no direct cost, you must assign their media exposure toorganic_media. For non-media treatments, you must assign the corresponding columns names tonon_media_treatments. For the definition of each variable, see Collect and organize your data .coord_to_columns = load . CoordToColumns ( time = 'time' , geo = 'geo' , controls = [ 'GQV' , 'Competitor_Sales' ], population = 'population' , kpi = 'conversions' , revenue_per_kpi = 'revenue_per_conversion' , media = [ 'Channel0_impression' , 'Channel1_impression' , 'Channel2_impression' , 'Channel3_impression' , 'Channel4_impression' , ], media_spend = [ 'Channel0_spend' , 'Channel1_spend' , 'Channel2_spend' , 'Channel3_spend' , 'Channel4_spend' , ], organic_media = [ 'Organic_channel0_impression' ], non_media_treatments = [ 'Promo' ], ) -
Map the media variables and the media spends to the designated channel names that you want to display in the two-page output. In the following example,
Channel0_impressionandChannel0_spendare connected to the same channel,Channel0.correct_media_to_channel = { 'Channel0_impression' : 'Channel0' , 'Channel1_impression' : 'Channel1' , 'Channel2_impression' : 'Channel2' , 'Channel3_impression' : 'Channel3' , 'Channel4_impression' : 'Channel4' , } correct_media_spend_to_channel = { 'Channel0_spend' : 'Channel0' , 'Channel1_spend' : 'Channel1' , 'Channel2_spend' : 'Channel2' , 'Channel3_spend' : 'Channel3' , 'Channel4_spend' : 'Channel4' , } -
Load the data using
CsvDataLoader:loader = load . CsvDataLoader ( csv_path = f '/ { PATH } / { FILENAME } .csv' , kpi_type = 'non_revenue' , coord_to_columns = coord_to_columns , media_to_channel = correct_media_to_channel , media_spend_to_channel = correct_media_spend_to_channel , ) data = loader . load ()Where:
-
kpi_typeis either'revenue'or'non_revenue'. -
PATHis the path to the data file location. -
FILENAMEis the name of your data file.
-
Xarray Dataset
To load the simulated Xarray
Dataset
using XrDatasetDataLoader
:
-
Load the data using
pickle:import pickle with open ( f '/ { PATH } / { FILENAME } .pkl' , 'r' ) as fh : XrDataset = pickle . load ( fh )Where:
-
PATHis the path to the data file location. -
FILENAMEis the name of your data file.
-
-
Pass the dataset to
XrDatasetDataLoader. Use thename_mappingargument to map the coordinates and arrays. Provide mapping if the names in the input dataset are different from the required names. The required coordinate names aregeo,time,control_variable,media_channel,organic_media_channel, andnon_media_channel. The required data variables names arekpi,revenue_per_kpi,controls,population,media,media_spend,organic_media, andnon_media_treatments.loader = load . XrDatasetDataLoader ( XrDataset , kpi_type = 'non_revenue' , name_mapping = { 'channel' : 'media_channel' , 'control' : 'control_variable' , 'organic_channel' : 'organic_media_channel' , 'non_media_treatment' : 'non_media_channel' , 'conversions' : 'kpi' , 'revenue_per_conversion' : 'revenue_per_kpi' , 'control_value' : 'controls' , 'spend' : 'media_spend' , 'non_media_treatment_value' : 'non_media_treatments' }, ) data = loader . load ()Where:
-
kpi_typeis either'revenue'or'non_revenue'.
-
Numpy ndarray
To load numpy ndarrays directly, use NDArrayInputDataBuilder
:
-
Create the data into separate numpy ndarrays.
import numpy as np kpi_nd = np . array ([[ 1 , 2 , 3 ], [ 4 , 5 , 6 ], [ 7 , 8 , 9 ]]) controls_nd = np . array ([ [[ 1 , 5 ], [ 2 , 6 ], [ 3 , 4 ]], [[ 7 , 8 ], [ 9 , 10 ], [ 11 , 12 ]], [[ 13 , 14 ], [ 15 , 16 ], [ 17 , 18 ]], ]) population_nd = np . array ([ 1 , 2 , 3 ]) revenue_per_kpi_nd = np . array ([[ 1 , 2 , 3 ], [ 4 , 5 , 6 ], [ 7 , 8 , 9 ]]) media_nd = np . array ([ [[ 1 , 5 ], [ 2 , 6 ], [ 3 , 4 ]], [[ 7 , 8 ], [ 9 , 10 ], [ 11 , 12 ]], [[ 13 , 14 ], [ 15 , 16 ], [ 17 , 18 ]], ]) media_spend_nd = np . array ([ [[ 1 , 5 ], [ 2 , 6 ], [ 3 , 4 ]], [[ 7 , 8 ], [ 9 , 10 ], [ 11 , 12 ]], [[ 13 , 14 ], [ 15 , 16 ], [ 17 , 18 ]], ]) organic_media_nd = np . array ([ [[ 1 , 5 ], [ 2 , 6 ], [ 3 , 4 ]], [[ 7 , 8 ], [ 9 , 10 ], [ 11 , 12 ]], [[ 13 , 14 ], [ 15 , 16 ], [ 17 , 18 ]], ]) non_media_treatments_nd = np . array ([ [[ 1 , 5 ], [ 2 , 6 ], [ 3 , 4 ]], [[ 7 , 8 ], [ 9 , 10 ], [ 11 , 12 ]], [[ 13 , 14 ], [ 15 , 16 ], [ 17 , 18 ]], ]) -
Use a
NDArrayInputDataBuilderto set time and geos, as well as give channel or dimension names as required in a Meridian input data. For the definition of each variable, see Collect and organize your data .from meridian.data import nd_array_input_data_builder as data_builder builder = ( data_builder . NDArrayInputDataBuilder ( kpi_type = 'non_revenue' ) ) builder . time_coords = [ '2024-01-02' , '2024-01-03' , '2024-01-01' ] builder . media_time_coords = [ '2024-01-02' , '2024-01-03' , '2024-01-01' ] builder . geos = [ 'B' , 'A' , 'C' ] builder = ( builder . with_kpi ( kpi_nd ) . with_revenue_per_kpi ( revenue_per_kpi_nd ) . with_population ( population_nd ) . with_controls ( controls_nd , control_names = [ "control0" , "control1" ]) . with_media ( m_nd = media_nd , ms_nd = media_spend_nd , media_channels = [ "channel0" , "channel1" ] ) . with_organic_media ( organic_media_nd , organic_media_channels = [ "organic_channel0" , "organic_channel1" ] ) . with_non_media_treatments ( non_media_treatments_nd , non_media_channel_names = [ "non_media_channel0" , "non_media_channel1" ] ) ) data = builder . build ()Where:
-
kpi_typeis either'revenue'or'non_revenue'.
-
Pandas DataFrame or other data formats
To load the simulated other data
format
(such as excel
) using DataFrameInputDataBuilder
:
-
Read the data (such as an
excelspreadsheet) into one or more PandasDataFrame(s).import pandas as pd df = pd . read_excel ( 'https://github.com/google/meridian/raw/main/meridian/data/simulated_data/xlsx/geo_all_channels.xlsx' , engine = 'openpyxl' , ) -
Use a
DataFrameInputDataBuilderto map column names to the variable types required in a Meridian input data. For the definition of each variable, see Collect and organize your data .from meridian.data import data_frame_input_data_builder as data_builder builder = data_builder . DataFrameInputDataBuilder ( kpi_type = 'non_revenue' , default_kpi_column = "conversions" , default_revenue_per_kpi_column = "revenue_per_conversion" , ) builder = ( builder . with_kpi ( df ) . with_revenue_per_kpi ( df ) . with_population ( df ) . with_controls ( df , control_cols = [ "GQV" , "Competitor_Sales" ]) ) channels = [ "Channel0" , "Channel1" , "Channel2" , "Channel3" , "Channel4" ] builder = builder . with_media ( df , media_cols = [ f " { channel } _impression" for channel in channels ], media_spend_cols = [ f " { channel } _spend" for channel in channels ], media_channels = channels , ) builder = ( builder . with_organic_media ( df , organic_media_cols = [ "Organic_channel0_impression" ], organic_media_channels = [ "Organic_channel0" ], ) . with_non_media_treatments ( df , non_media_treatment_cols = [ 'Promo' ] ) ) data = builder . build ()Where:
-
kpi_typeis either'revenue'or'non_revenue'.
-
Next, you can create your model .


