A data container for advertising data in a format supported by Meridian.
meridian
.
data
.
input_data
.
InputData
(
kpi
:
xr
.
DataArray
,
kpi_type
:
str
,
population
:
xr
.
DataArray
,
controls
:
(
xr
.
DataArray
|
None
)
=
None
,
revenue_per_kpi
:
(
xr
.
DataArray
|
None
)
=
None
,
media
:
(
xr
.
DataArray
|
None
)
=
None
,
media_spend
:
(
xr
.
DataArray
|
None
)
=
None
,
reach
:
(
xr
.
DataArray
|
None
)
=
None
,
frequency
:
(
xr
.
DataArray
|
None
)
=
None
,
rf_spend
:
(
xr
.
DataArray
|
None
)
=
None
,
organic_media
:
(
xr
.
DataArray
|
None
)
=
None
,
organic_reach
:
(
xr
.
DataArray
|
None
)
=
None
,
organic_frequency
:
(
xr
.
DataArray
|
None
)
=
None
,
non_media_treatments
:
(
xr
.
DataArray
|
None
)
=
None
)
Attributes
(n_geos, n_times)
containing the
non-negative dependent variable. Typically this is the number of units
sold, but it can be any metric, such as revenue or conversions.'revenue'
or 'non-revenue'
type. When the kpi_type
is 'non-revenue'
and revenue_per_kpi
exists, ROI calibration is used and the analysis is run
on revenue. When the revenue_per_kpi
doesn't exist for the same kpi_type
, custom ROI calibration is used and the analysis is run on KPI.(n_geos,)
containing the population
of each group. This variable is used to scale the KPI and media for
modeling.(n_geos, n_times,
n_controls)
containing control variable values.(n_geos, n_times)
containing the average revenue amount per KPI unit. Although modeling is
done on kpi
, model analysis and optimization are done on KPI *
revenue_per_kpi
(revenue), if this value is available. If kpi
corresponds to revenue, then an array of ones is passed automatically.(n_geos, n_media_times,
n_media_channels)
containing non-negative media execution values.
Typically these are impressions, but it can be any metric, such as cost or
clicks. n_media_times
≥ n_times
is required, and the final n_times
time periods must align with the time window of kpi
and controls
. Due
to lagged effects, we recommend that the time window for media includes up
to max_lag
additional periods prior to this window. If n_media_times
< n_times
+ max_lag
, the model effectively imputes media history as zero
(no media execution). If n_media_times
> n_times
+ max_lag
, then
only the final n_times
+ max_lag
periods are used to fit the model. media
and media_spend
must contain the same number of media channels
in the same order. If either of these arguments is passed, then the other
is not optional.DataArray
containing the cost of each media
channel. This is used as the denominator for ROI calculations. It is also
used to calculate an assumed cost per media unit for post-modeling
analysis such as response curves and budget optimization. Only the
aggregate spend (across geos and time periods) is required for these
calculations. However, a spend breakdown by geo and time period is
required if roi_calibration_period
is specified or if conducting
post-modeling analysis on a specific subset of geos and/or time periods.
The DataArray shape can be (n_geos, n_times, n_media_channels)
or (n_media_channels,)
if the data is aggregated over geo
and time
dimensions. We recommend that the spend total aligns with the time window
of the kpi
and controls
data, which is the time window over which
incremental outcome of the ROI numerator is calculated. However, note that
incremental outcome is influenced by media execution prior to this time
window, through lagged effects, and excludes lagged effects beyond the
time window of media executed during the time window. media
and media_spend
must contain the same number of media channels in the same
order. If either of these arguments is passed, then the other is not
optional. If a tensor of shape (n_media_channels,)
is passed as media_spend
, then it will be automatically allocated across geos and
times proportinally to media
.DataArray
of dimensions (n_geos, n_media_times,
n_rf_channels)
containing non-negative reach
values. It is required
that n_media_times
≥ n_times
, and the final n_times
time periods
must align with the time window of kpi
and controls
. The time window
must include the time window of the kpi
and controls
data, but it is
optional to include lagged time periods prior to the time window of the kpi
and controls
data. If lagged reach is not included, or if the
lagged reach includes fewer than max_lag
time periods, then the model
calculates Adstock assuming that reach execution is zero prior to the
first observed time period. We recommend including n_times
+ max_lag
time periods, unless the value of max_lag
is prohibitively large. If
only media
data is used, then reach
will be None
. reach
, frequency
, and rf_spend
must contain the same number of media channels
in the same order. If any of these arguments is passed, then the others
are not optional.DataArray
of dimensions (n_geos, n_media_times,
n_rf_channels)
containing non-negative frequency
values. It is required
that n_media_times
≥ n_times
, and the final n_times
time periods
must align with the time window of kpi
and controls
. The time window
must include the time window of the kpi
and controls
data, but it is
optional to include lagged time periods prior to the time window of the kpi
and controls
data. If lagged frequency is not included, or if the
lagged frequency includes fewer than max_lag
time periods, then the
model calculates Adstock assuming that frequency execution is zero prior
to the first observed time period. We recommend including n_times
+ max_lag
time periods, unless the value of max_lag
is prohibitively
large. If only media
data is used, then frequency
will be None
. reach
, frequency
, and rf_spend
must contain the same number of media
channels in the same order. If any of these arguments is passed, then the
others are not optional.DataArray
containing the cost of each reach and
frequency channel. This is used as the denominator for ROI calculations.
It is also used to calculate an assumed cost per media unit for
post-modeling analysis such as response curves and budget optimization.
Only the aggregate spend (across geos and time periods) is required for
these calculations. However, a spend breakdown by geo and time period is
required if rf_roi_calibration_period
is specified or if conducting
post-modeling analysis on a specific subset of geos and/or time periods.
The DataArray shape can be (n_rf_channels,)
or (n_geos, n_times,
n_rf_channels)
. The spend should be aggregated over geo and/or time
dimensions that are not represented. We recommend that the spend total
aligns with the time window of the kpi
and controls
data, which is the
time window over which incremental outcome of the ROI numerator is
calculated. However, note that incremental outcome is influenced by media
execution prior to this time window, through lagged effects, and excludes
lagged effects beyond the time window of media executed during the time
window. If only media
data is used, rf_spend
will be None
. reach
, frequency
, and rf_spend
must contain the same number of media channels
in the same order. If any of these arguments is passed, then the others
are not optional. If a tensor of shape (n_rf_channels,)
is passed as rf_spend
, then it will be automatically allocated across geos and times
proportionally to (reach * frequency)
.DataArray
of dimensions (n_geos,
n_media_times, n_organic_media_channels)
containing non-negative organic
media values. Organic media variables are media activities that have no
direct cost. These may include impressions from newsletters, a blog post,
social media activity or email campaigns but it can be any metric, such as
clicks. n_media_times
≥ n_times
is required, and the final n_times
time periods must align with the time window of kpi
and controls
. Due
to lagged effects, we recommend that the time window for organic media
includes up to max_lag
additional periods prior to this window. If n_organic_media_times
< n_times
+ max_lag
, the model effectively
imputes organic media history. If n_organic_media_times
> n_times
+ max_lag
, then only the final n_times
+ max_lag
periods are used to
fit the model.DataArray
of dimensions (n_geos,
n_media_times, n_organic_rf_channels)
containing non-negative organic
reach values. It is required that n_media_times
≥ n_times
, and the
final n_times
time periods must align with the time window of kpi
and controls
. The time window must include the time window of the kpi
and controls
data, but it is optional to include lagged time periods prior
to the time window of the kpi
and controls
data. If lagged reach is
not included, or if the lagged reach includes fewer than max_lag
time
periods, then the model calculates Adstock assuming that reach execution
is zero prior to the first observed time period. We recommend including n_times
+ max_lag
time periods, unless the value of max_lag
is
prohibitively large. If no organic reach and frequency data is used, then organic_reach
and organic_frequency
will be None
. organic_reach
,
and organic_frequency
must contain the same number of channels in the
same order. If any of these arguments is passed, then the other is not
optional.DataArray
of dimensions (n_geos,
n_media_times, n_organic_rf_channels)
containing non-negative organic
frequency values. It is required that n_media_times
≥ n_times
, and the
final n_times
time periods must align with the time window of kpi
and controls
. The time window must include the time window of the kpi
and controls
data, but it is optional to include lagged time periods prior
to the time window of the kpi
and controls
data. If lagged frequency
is not included, or if the lagged frequency includes fewer than max_lag
time periods, then the model calculates Adstock assuming that frequency
execution is zero prior to the first observed time period. We recommend
including n_times
+ max_lag
time periods, unless the value of max_lag
is prohibitively large. If no organic reach and frequency data
is used, then organic_frequency
will be None
. organic_reach
and organic_frequency
must contain the same number of channels in the same
order. If any of these arguments is passed, then the other is not
optional.(n_geos, n_times,
n_non_media_channels)
containing non-media treatment variables values.
Non-media treatment variables are marketing activities taken by the
advertiser not directly related to media. They have no direct marketing
cost associated with them but unlike organic media variables there are no
Adstock and Hill effects. They differ from control variables as they are
considered to be intervenable and hence are treatment variables under the
causal model. Some examples include running a promotion, the price of a
product and a change in a product's packaging and/or design.mean-centered by geo.
Methods
aggregate_media_spend
aggregate_media_spend
(
calibration_period
:
(
np
.
ndarray
|
None
)
=
None
)
->
(
np
.
ndarray
|
None
)
Aggregates media spend by channel over the calibration period.
aggregate_rf_spend
aggregate_rf_spend
(
calibration_period
:
(
np
.
ndarray
|
None
)
=
None
)
->
(
np
.
ndarray
|
None
)
Aggregates RF spend by channel over the calibration period.
as_dataset
as_dataset
()
->
xr
.
Dataset
Returns data as a single xarray.Dataset
object.
copy
copy
(
deep
:
bool
=
True
)
->
'InputData'
Returns a copy of the InputData instance.
deep
get_all_adstock_hill_channels
get_all_adstock_hill_channels
()
->
np
.
ndarray
Returns all channel dimensions that adstock hill is applied to.
RF, organic media and organic RF channels are concatenated to the end of the media channels if they are present.
get_all_channels
get_all_channels
()
->
np
.
ndarray
Returns all the channel dimensions.
This method returns media, RF, organic media, organic RF and non-media channel names, concatenated into a single array in that order.
get_all_media_and_rf
get_all_media_and_rf
()
->
np
.
ndarray
Returns all of the media execution values, including both media and RF.
If media, reach, and frequency were used for modeling, reach * frequency is concatenated to the end of media.
np.ndarray
with dimensions (n_geos, n_media_times, n_channels)
containing media or reach * frequency for each media_channel
or rf_channel
. get_all_paid_channels
get_all_paid_channels
()
->
np
.
ndarray
Returns all the paid channel dimensions, including both media and RF.
If both media and RF channels are present, then the RF channels are concatenated to the end of the media channels.
get_n_top_largest_geos
get_n_top_largest_geos
(
num_geos
:
int
)
->
list
[
str
]
Finds the specified number of the largest geos by population.
num_geos
get_organic_media_channels_argument_builder
get_organic_media_channels_argument_builder
()
->
meridian
.
data
.
arg_builder
.
OrderedListArgumentBuilder
Returns an argument builder for organic media channels only .
get_organic_rf_channels_argument_builder
get_organic_rf_channels_argument_builder
()
->
meridian
.
data
.
arg_builder
.
OrderedListArgumentBuilder
Returns an argument builder for organic RF channels only .
get_paid_channels_argument_builder
get_paid_channels_argument_builder
()
->
meridian
.
data
.
arg_builder
.
OrderedListArgumentBuilder
Returns an argument builder for all paid channels.
get_paid_media_channels_argument_builder
get_paid_media_channels_argument_builder
()
->
meridian
.
data
.
arg_builder
.
OrderedListArgumentBuilder
Returns an argument builder for paid media channels only .
get_paid_rf_channels_argument_builder
get_paid_rf_channels_argument_builder
()
->
meridian
.
data
.
arg_builder
.
OrderedListArgumentBuilder
Returns an argument builder for paid RF channels only .
get_total_outcome
get_total_outcome
()
->
np
.
ndarray
Returns total outcome, aggregated over geos and times.
get_total_spend
get_total_spend
()
->
np
.
ndarray
Returns total spend, including media_spend
and rf_spend
.
__eq__
__eq__
(
other
)
Return self==value.