Join the newly launched Discord community for real-time discussions, peer support, and direct interaction with the Meridian team!

meridian.data.load.CsvDataLoader

View source on GitHub

Reads data from a CSV file.

Inherits From: InputDataLoader

  meridian 
 . 
 data 
 . 
 load 
 . 
 CsvDataLoader 
 ( 
 csv_path 
 : 
 str 
 , 
 coord_to_columns 
 : 
   meridian 
 . 
 data 
 . 
 load 
 . 
 CoordToColumns 
 
 
 , 
 kpi_type 
 : 
 str 
 , 
 media_to_channel 
 : 
 ( 
 Mapping 
 [ 
 str 
 , 
 str 
 ] 
 | 
 None 
 ) 
 = 
 None 
 , 
 media_spend_to_channel 
 : 
 ( 
 Mapping 
 [ 
 str 
 , 
 str 
 ] 
 | 
 None 
 ) 
 = 
 None 
 , 
 reach_to_channel 
 : 
 ( 
 Mapping 
 [ 
 str 
 , 
 str 
 ] 
 | 
 None 
 ) 
 = 
 None 
 , 
 frequency_to_channel 
 : 
 ( 
 Mapping 
 [ 
 str 
 , 
 str 
 ] 
 | 
 None 
 ) 
 = 
 None 
 , 
 rf_spend_to_channel 
 : 
 ( 
 Mapping 
 [ 
 str 
 , 
 str 
 ] 
 | 
 None 
 ) 
 = 
 None 
 , 
 organic_reach_to_channel 
 : 
 ( 
 Mapping 
 [ 
 str 
 , 
 str 
 ] 
 | 
 None 
 ) 
 = 
 None 
 , 
 organic_frequency_to_channel 
 : 
 ( 
 Mapping 
 [ 
 str 
 , 
 str 
 ] 
 | 
 None 
 ) 
 = 
 None 
 )

This class reads input data from a CSV file. The coord_to_columns attribute stores a mapping from target InputData coordinates and array names to the CSV column names, if they are different. The fields are:

geo , time , kpi , revenue_per_kpi , population (single column)
controls (multiple columns, optional)
(1) media , media_spend (multiple columns)
(2) reach , frequency , rf_spend (multiple columns)
non_media_treatments (multiple columns, optional)
organic_media (multiple columns, optional)
organic_reach , organic_frequency (multiple columns, optional)

The DataFrame must include either (1) or (2), but doesn't need to include both.

Internally, this class reads the CSV file into a Pandas DataFrame and then loads the data using DataFrameDataLoader .

Args

csv_path

The path to the CSV file to read from. One of the following conditions is required:

There are no gaps in the data.
For up to max_lag initial periods there is only media data and empty cells in all the data columns different from media , reach , frequency , organic_media , organic_reach and organic_frequency ( kpi , revenue_per_kpi , media_spend , rf_spend , controls , population and non_media_treatments ).

coord_to_columns

A CoordToColumns object whose fields are the desired coordinates of the InputData and the values are the current names of columns (or lists of columns) in the CSV file. Example:

 coord_to_columns = CoordToColumns(
    geo='dmas',
    time='dates',
    kpi='revenue',
    revenue_per_kpi='revenue_per_conversions',
    media=['impressions_tv', impressions_yt', 'impressions_search'],
    spend=['spend_tv', 'spend_yt', 'spend_search'],
    reach=['reach_fb'],
    frequency=['frequency_fb'],
    rf_spend=['rf_spend_fb'],
    controls=['control_income'],
    population='population',
    non_media_treatments=['price', 'discount'],
    organic_media=['organic_impressions_blog'],
    organic_reach=['organic_reach_newsletter'],
    organic_frequency=['organic_frequency_newsletter'],
)

kpi_type

A string denoting whether the KPI is of a 'revenue' or 'non-revenue' type. When the kpi_type is 'non-revenue' and there exists a revenue_per_kpi , ROI calibration is used and the analysis is run on revenue. When the revenue_per_kpi doesn't exist for the same kpi_type , custom ROI calibration is used and the analysis is run on KPI.

media_to_channel

A dictionary whose keys are the actual column names for media data in the CSV file and values are the desired channel names, the same as for the media_spend data. Example:

 media_to_channel = {
    'media_tv': 'tv', 'media_yt': 'yt', 'media_fb': 'fb'
}

media_spend_to_channel

A dictionary whose keys are the actual column names for media_spend data in the CSV file and values are the desired channel names, the same as for the media data. Example:

 `media_spend_to_channel = {
    'spend_tv': 'tv', 'spend_yt': 'yt', 'spend_fb': 'fb'
}

reach_to_channel

A dictionary whose keys are the actual column names for reach data in the dataframe and values are the desired channel names, the same as for the rf_spend data. Example:

 reach_to_channel = {
    'reach_tv': 'tv', 'reach_yt': 'yt', 'reach_fb': 'fb'
}

frequency_to_channel

A dictionary whose keys are the actual column names for frequency data in the dataframe and values are the desired channel names, the same as for the rf_spend data. Example:

 frequency_to_channel = {
    'frequency_tv': 'tv', 'frequency_yt': 'yt', 'frequency_fb': 'fb'
}

rf_spend_to_channel

A dictionary whose keys are the actual column names for rf_spend data in the dataframe and values are the desired channel names, the same as for the reach and frequency data. Example:

 rf_spend_to_channel = {
    'rf_spend_tv': 'tv', 'rf_spend_yt': 'yt', 'rf_spend_fb': 'fb'
}

organic_reach_to_channel

A dictionary whose keys are the actual column names for organic_reach data in the dataframe and values are the desired channel names, the same as for the organic_frequency . Example:

 organic_reach_to_channel = {
    'organic_reach_newsletter': 'newsletter',
}

organic_frequency_to_channel

A dictionary whose keys are the actual column names for organic_frequency data in the dataframe and values are the desired channel names, the same as for the organic_reach data. Example:

 organic_frequency_to_channel = {
    'organic_frequency_newsletter': 'newsletter',
}

Methods

`load`

View source

  load 
 () 
 -> 
   meridian 
 . 
 data 
 . 
 input_data 
 . 
 InputData

Reads data from a CSV file and returns an InputData object.