Reads data from a Pandas DataFrame
.
Inherits From: InputDataLoader
meridian
.
data
.
load
.
DataFrameDataLoader
(
df
:
pd
.
DataFrame
,
coord_to_columns
:
CoordToColumns
,
kpi_type
:
str
,
media_to_channel
:
(
Mapping
[
str
,
str
]
|
None
)
=
None
,
media_spend_to_channel
:
(
Mapping
[
str
,
str
]
|
None
)
=
None
,
reach_to_channel
:
(
Mapping
[
str
,
str
]
|
None
)
=
None
,
frequency_to_channel
:
(
Mapping
[
str
,
str
]
|
None
)
=
None
,
rf_spend_to_channel
:
(
Mapping
[
str
,
str
]
|
None
)
=
None
,
organic_reach_to_channel
:
(
Mapping
[
str
,
str
]
|
None
)
=
None
,
organic_frequency_to_channel
:
(
Mapping
[
str
,
str
]
|
None
)
=
None
)
This class reads input data from a Pandas DataFrame
. The coord_to_columns
attribute stores a mapping from target InputData
coordinates and array names
to the DataFrame column names if they are different. The fields are:
-
geo
,time
,kpi
,revenue_per_kpi
,population
(single column) -
controls
(multiple columns, optional) - (1)
media
,media_spend
(multiple columns) - (2)
reach
,frequency
,rf_spend
(multiple columns) -
non_media_treatments
(multiple columns, optional) -
organic_media
(multiple columns, optional) -
organic_reach
,organic_frequency
(multiple columns, optional)
The DataFrame
must include (1) or (2), but doesn't need to include both.
Also, each media channel must appear in (1) or (2), but not both.
Note the following:
- Time column values must be formatted in yyyy-mm-dd date format.
- In a national model,
geo
andpopulation
are optional. If thepopulation
is provided, it is reset to a default value of1.0
. - If
media
data is provided, thenmedia_to_channel
andmedia_spend_to_channel
are required. Ifreach
andfrequency
data is provided, thenreach_to_channel
andfrequency_to_channel
andrf_spend_to_channel
are required. - If
organic_reach
andorganic_frequency
data is provided, thenorganic_reach_to_channel
andorganic_frequency_to_channel
are required.
Example:
# df = [...]
coord_to_columns
=
CoordToColumns
(
geo
=
'dmas'
,
time
=
'dates'
,
kpi
=
'conversions'
,
revenue_per_kpi
=
'revenue_per_conversions'
,
controls
=
[
'control_income'
],
population
=
'populations'
,
media
=
[
'impressions_tv'
,
'impressions_fb'
,
'impressions_search'
],
media_spend
=
[
'spend_tv'
,
'spend_fb'
,
'spend_search'
],
reach
=
[
'reach_yt'
],
frequency
=
[
'frequency_yt'
],
rf_spend
=
[
'rf_spend_yt'
],
non_media_treatments
=
[
'price'
,
'discount'
]
organic_media
=
[
'organic_impressions_blog'
],
organic_reach
=
[
'organic_reach_newsletter'
],
organic_frequency
=
[
'organic_frequency_newsletter'
],
)
media_to_channel
=
{
'impressions_tv'
:
'tv'
,
'impressions_fb'
:
'fb'
,
'impressions_search'
:
'search'
,
}
media_spend_to_channel
=
{
'spend_tv'
:
'tv'
,
'spend_fb'
:
'fb'
,
'spend_search'
:
'search'
}
reach_to_channel
=
{
'reach_yt'
:
'yt'
}
frequency_to_channel
=
{
'frequency_yt'
:
'yt'
}
rf_spend_to_channel
=
{
'rf_spend_yt'
:
'yt'
}
organic_reach_to_channel
=
{
'organic_reach_newsletter'
:
'newsletter'
}
organic_frequency_to_channel
=
{
'organic_frequency_newsletter'
:
'newsletter'
}
data_loader
=
DataFrameDataLoader
(
df
=
df
,
coord_to_columns
=
coord_to_columns
,
kpi_type
=
'non-revenue'
,
media_to_channel
=
media_to_channel
,
media_spend_to_channel
=
media_spend_to_channel
,
reach_to_channel
=
reach_to_channel
,
frequency_to_channel
=
frequency_to_channel
,
rf_spend_to_channel
=
rf_spend_to_channel
,
organic_reach_to_channel
=
organic_reach_to_channel
,
organic_frequency_to_channel
=
organic_frequency_to_channel
,
)
data
=
data_loader
.
load
()
Attributes
- There are no NAs in the dataframe
- For any number of initial periods there is only media data and NAs in
all of the non-media data columns (
kpi
,revenue_per_kpi
,media_spend
,controls
, andpopulation
).
coord_to_columns = CoordToColumns(
geo='dmas',
time='dates',
kpi='conversions',
revenue_per_kpi='revenue_per_conversions',
media=['impressions_tv', 'impressions_yt', 'impressions_search'],
spend=['spend_tv', 'spend_yt', 'spend_search'],
controls=['control_income'],
population=population,
)
'revenue'
or 'non-revenue'
type. When the kpi_type
is 'non-revenue'
and there
exists a revenue_per_kpi
, ROI calibration is used and the analysis is
run on revenue. When the revenue_per_kpi
doesn't exist for the same kpi_type
, custom ROI calibration is used and the analysis is run on KPI.media
data in the dataframe, and the values are the desired channel
names. These are the same as for the media_spend
data. Example: media_to_channel = {'media_tv': 'tv', 'media_yt': 'yt', 'media_fb': 'fb'}
media_spend
data in the dataframe, and the values are the desired
channel names. These are same as for the media
data. Example: media_spend_to_channel = {
'spend_tv': 'tv', 'spend_yt': 'yt', 'spend_fb': 'fb'
}
reach
data in the dataframe, and the values are the desired channel
names. These are the same as for the rf_spend
data. Example: reach_to_channel = {'reach_tv': 'tv', 'reach_yt': 'yt', 'reach_fb': 'fb'}
frequency
data in the dataframe, and the values are the desired
channel names. These are the same as for the rf_spend
data. Example: frequency_to_channel = {
'frequency_tv': 'tv', 'frequency_yt': 'yt', 'frequency_fb': 'fb'
}
rf_spend
data in the dataframe, and values are the desired channel
names. These are the same as for the reach
and frequency
data.
Example: rf_spend_to_channel = {
'rf_spend_tv': 'tv', 'rf_spend_yt': 'yt', 'rf_spend_fb': 'fb'
}
organic_reach
data in the dataframe, and the values are the desired
channel names. These are the same as for the organic_frequency
data.
Example: organic_reach_to_channel = {
'organic_reach_newsletter': 'newsletter',
}
organic_frequency
data in the dataframe, and the values are
the desired channel names. These are the same as for the organic_reach
data. Example: organic_frequency_to_channel = {
'organic_frequency_newsletter': 'newsletter',
}
Methods
load
load
()
->
meridian
.
data
.
input_data
.
InputData
Reads data from a dataframe and returns an InputData object.
__eq__
__eq__
(
other
)
Return self==value.