meridian.data.load.DataFrameDataLoader

Reads data from a Pandas DataFrame .

Inherits From: InputDataLoader

This class reads input data from a Pandas DataFrame . The coord_to_columns attribute stores a mapping from target InputData coordinates and array names to the DataFrame column names if they are different. The fields are:

  • geo , time , kpi , revenue_per_kpi , population (single column)
  • controls (multiple columns, optional)
  • (1) media , media_spend (multiple columns)
  • (2) reach , frequency , rf_spend (multiple columns)
  • non_media_treatments (multiple columns, optional)
  • organic_media (multiple columns, optional)
  • organic_reach , organic_frequency (multiple columns, optional)

The DataFrame must include (1) or (2), but doesn't need to include both. Also, each media channel must appear in (1) or (2), but not both.

Note the following:

  • Time column values must be formatted in yyyy-mm-dd date format.
  • In a national model, geo and population are optional. If the population is provided, it is reset to a default value of 1.0 .
  • If media data is provided, then media_to_channel and media_spend_to_channel are required. If reach and frequency data is provided, then reach_to_channel and frequency_to_channel and rf_spend_to_channel are required.
  • If organic_reach and organic_frequency data is provided, then organic_reach_to_channel and organic_frequency_to_channel are required.

Example:

  # df = [...] 
 coord_to_columns 
 = 
 CoordToColumns 
 ( 
 geo 
 = 
 'dmas' 
 , 
 time 
 = 
 'dates' 
 , 
 kpi 
 = 
 'conversions' 
 , 
 revenue_per_kpi 
 = 
 'revenue_per_conversions' 
 , 
 controls 
 = 
 [ 
 'control_income' 
 ], 
 population 
 = 
 'populations' 
 , 
 media 
 = 
 [ 
 'impressions_tv' 
 , 
 'impressions_fb' 
 , 
 'impressions_search' 
 ], 
 media_spend 
 = 
 [ 
 'spend_tv' 
 , 
 'spend_fb' 
 , 
 'spend_search' 
 ], 
 reach 
 = 
 [ 
 'reach_yt' 
 ], 
 frequency 
 = 
 [ 
 'frequency_yt' 
 ], 
 rf_spend 
 = 
 [ 
 'rf_spend_yt' 
 ], 
 non_media_treatments 
 = 
 [ 
 'price' 
 , 
 'discount' 
 ] 
 organic_media 
 = 
 [ 
 'organic_impressions_blog' 
 ], 
 organic_reach 
 = 
 [ 
 'organic_reach_newsletter' 
 ], 
 organic_frequency 
 = 
 [ 
 'organic_frequency_newsletter' 
 ], 
 ) 
 media_to_channel 
 = 
 { 
 'impressions_tv' 
 : 
 'tv' 
 , 
 'impressions_fb' 
 : 
 'fb' 
 , 
 'impressions_search' 
 : 
 'search' 
 , 
 } 
 media_spend_to_channel 
 = 
 { 
 'spend_tv' 
 : 
 'tv' 
 , 
 'spend_fb' 
 : 
 'fb' 
 , 
 'spend_search' 
 : 
 'search' 
 } 
 reach_to_channel 
 = 
 { 
 'reach_yt' 
 : 
 'yt' 
 } 
 frequency_to_channel 
 = 
 { 
 'frequency_yt' 
 : 
 'yt' 
 } 
 rf_spend_to_channel 
 = 
 { 
 'rf_spend_yt' 
 : 
 'yt' 
 } 
 organic_reach_to_channel 
 = 
 { 
 'organic_reach_newsletter' 
 : 
 'newsletter' 
 } 
 organic_frequency_to_channel 
 = 
 { 
 'organic_frequency_newsletter' 
 : 
 'newsletter' 
 } 
 data_loader 
 = 
 DataFrameDataLoader 
 ( 
 df 
 = 
 df 
 , 
 coord_to_columns 
 = 
 coord_to_columns 
 , 
 kpi_type 
 = 
 'non-revenue' 
 , 
 media_to_channel 
 = 
 media_to_channel 
 , 
 media_spend_to_channel 
 = 
 media_spend_to_channel 
 , 
 reach_to_channel 
 = 
 reach_to_channel 
 , 
 frequency_to_channel 
 = 
 frequency_to_channel 
 , 
 rf_spend_to_channel 
 = 
 rf_spend_to_channel 
 , 
 organic_reach_to_channel 
 = 
 organic_reach_to_channel 
 , 
 organic_frequency_to_channel 
 = 
 organic_frequency_to_channel 
 , 
 ) 
 data 
 = 
 data_loader 
 . 
 load 
 () 
 

df
The pd.DataFrame object to read from. One of the following conditions is required:

  • There are no NAs in the dataframe
  • For any number of initial periods there is only media data and NAs in all of the non-media data columns ( kpi , revenue_per_kpi , media_spend , controls , and population ).
    coord_to_columns
    A CoordToColumns object whose fields are the desired coordinates of the InputData and the values are the current names of columns (or lists of columns) in the DataFrame. Example:
 coord_to_columns = CoordToColumns(
    geo='dmas',
    time='dates',
    kpi='conversions',
    revenue_per_kpi='revenue_per_conversions',
    media=['impressions_tv', 'impressions_yt', 'impressions_search'],
    spend=['spend_tv', 'spend_yt', 'spend_search'],
    controls=['control_income'],
    population=population,
) 

kpi_type
A string denoting whether the KPI is of a 'revenue' or 'non-revenue' type. When the kpi_type is 'non-revenue' and there exists a revenue_per_kpi , ROI calibration is used and the analysis is run on revenue. When the revenue_per_kpi doesn't exist for the same kpi_type , custom ROI calibration is used and the analysis is run on KPI.
media_to_channel
A dictionary whose keys are the actual column names for media data in the dataframe, and the values are the desired channel names. These are the same as for the media_spend data. Example:

 media_to_channel = {'media_tv': 'tv', 'media_yt': 'yt', 'media_fb': 'fb'} 

media_spend_to_channel
A dictionary whose keys are the actual column names for media_spend data in the dataframe, and the values are the desired channel names. These are same as for the media data. Example:

 media_spend_to_channel = {
    'spend_tv': 'tv', 'spend_yt': 'yt', 'spend_fb': 'fb'
} 

reach_to_channel
A dictionary whose keys are the actual column names for reach data in the dataframe, and the values are the desired channel names. These are the same as for the rf_spend data. Example:

 reach_to_channel = {'reach_tv': 'tv', 'reach_yt': 'yt', 'reach_fb': 'fb'} 

frequency_to_channel
A dictionary whose keys are the actual column names for frequency data in the dataframe, and the values are the desired channel names. These are the same as for the rf_spend data. Example:

 frequency_to_channel = {
    'frequency_tv': 'tv', 'frequency_yt': 'yt', 'frequency_fb': 'fb'
} 

rf_spend_to_channel
A dictionary whose keys are the actual column names for rf_spend data in the dataframe, and values are the desired channel names. These are the same as for the reach and frequency data. Example:

 rf_spend_to_channel = {
    'rf_spend_tv': 'tv', 'rf_spend_yt': 'yt', 'rf_spend_fb': 'fb'
} 

organic_reach_to_channel
A dictionary whose keys are the actual column names for organic_reach data in the dataframe, and the values are the desired channel names. These are the same as for the organic_frequency data. Example:

 organic_reach_to_channel = {
    'organic_reach_newsletter': 'newsletter',
} 

organic_frequency_to_channel
A dictionary whose keys are the actual column names for organic_frequency data in the dataframe, and the values are the desired channel names. These are the same as for the organic_reach data. Example:

 organic_frequency_to_channel = {
    'organic_frequency_newsletter': 'newsletter',
} 

Methods

load

View source

Reads data from a dataframe and returns an InputData object.

__eq__

Return self==value.

frequency_to_channel
None
media_spend_to_channel
None
media_to_channel
None
organic_frequency_to_channel
None
organic_reach_to_channel
None
reach_to_channel
None
rf_spend_to_channel
None

Design a Mobile Site
View Site in Mobile | Classic
Share by: