Runs calculations to analyze the raw data after fitting the model.
meridian
.
analysis
.
analyzer
.
Analyzer
(
meridian
:
meridian
.
model
.
model
.
Meridian
)
Methods
adstock_decay
adstock_decay
(
confidence_level
:
float
=
constants
.
DEFAULT_CONFIDENCE_LEVEL
)
->
pd
.
DataFrame
Calculates adstock decay for paid media, RF, and organic media channels.
confidence_level
channel
, time_units
, distribution
, ci_hi
, ci_lo
, and mean
for the Adstock function. baseline_summary_metrics
baseline_summary_metrics
(
selected_geos
:
(
Sequence
[
str
]
|
None
)
=
None
,
selected_times
:
(
Sequence
[
str
]
|
None
)
=
None
,
aggregate_geos
:
bool
=
True
,
aggregate_times
:
bool
=
True
,
non_media_baseline_values
:
(
Sequence
[
float
]
|
None
)
=
None
,
confidence_level
:
float
=
constants
.
DEFAULT_CONFIDENCE_LEVEL
,
batch_size
:
int
=
constants
.
DEFAULT_BATCH_SIZE
)
->
xr
.
Dataset
Returns baseline summary metrics.
selected_geos
selected_times
aggregate_geos
True
, the expected outcome is summed over
all of the regions.aggregate_times
True
, the expected outcome is summed over
all of the time periods.non_media_baseline_values
(n_non_media_channels,)
. Each element is a float which means that the
fixed value will be used as baseline for the given channel. It is
expected that they are scaled by population for the channels where model_spec.non_media_population_scaling_id
is True
. If None
, the model_spec.non_media_baseline_values
is used, which defaults to the
minimum value for each non_media treatment channel.confidence_level
batch_size
batch_size
. The calculation will
generally be faster with larger batch_size
values.
xr.Dataset
with coordinates: metric
( mean
, median
, ci_low
, ci_high
), distribution
(prior, posterior) and contains the
following data variables: baseline_outcome
, pct_of_contribution
. compute_incremental_outcome_aggregate
compute_incremental_outcome_aggregate
(
use_posterior
:
bool
,
new_data
:
(
meridian
.
analysis
.
analyzer
.
DataTensors
|
None
)
=
None
,
use_kpi
:
(
bool
|
None
)
=
None
,
include_non_paid_channels
:
bool
=
True
,
non_media_baseline_values
:
(
Sequence
[
float
]
|
None
)
=
None
,
**
kwargs
)
->
meridian
.
backend
.
Tensor
Aggregates the incremental outcome of the media channels.
use_posterior
True
, then the incremental outcome posterior
distribution is calculated. Otherwise, the prior distribution is
calculated.new_data
DataTensors
container with optional tensors: media
, reach
, frequency
, organic_media
, organic_reach
, organic_frequency
, non_media_treatments
and revenue_per_kpi
. If None
, the incremental outcome is calculated using the InputData
provided to the Meridian object. If new_data
is provided, the
incremental outcome is calculated using the new tensors in new_data
and the original values of the remaining tensors. For example, compute_incremental_outcome_aggregate(new_data=DataTensors(media=new_media))
computes the incremental outcome using new_media
and the original
values of reach
, frequency
, organic_media
, organic_reach
, organic_frequency
, non_media_treatments
and revenue_per_kpi
. If
any of the tensors in new_data
is provided with a different number of
time periods than in InputData
, then all tensors must be provided with
the same number of time periods.use_kpi
True
, the summary metrics are calculated using KPI.
If False
, the metrics are calculated using revenue.include_non_paid_channels
True
, then non-media treatments
and organic effects are included in the calculation. If False
, then
only the paid media and RF effects are included.non_media_baseline_values
(n_non_media_channels,)
. Each element is a float which means that the
fixed value will be used as baseline for the given channel. It is
expected that they are scaled by population for the channels where model_spec.non_media_population_scaling_id
is True
. If None
, the model_spec.non_media_baseline_values
is used, which defaults to the
minimum value for each non_media treatment channel.**kwargs
incremental_outcome
, which could contain
selected_geos, selected_times, aggregate_geos, aggregate_times,
batch_size.
incremental_outcome
except the size
of the channel dimension is incremented by one, with the new component at
the end containing the total incremental outcome of all channels. cpik
cpik
(
use_posterior
:
bool
=
True
,
new_data
:
(
meridian
.
analysis
.
analyzer
.
DataTensors
|
None
)
=
None
,
selected_geos
:
(
Sequence
[
str
]
|
None
)
=
None
,
selected_times
:
(
Sequence
[
str
]
|
Sequence
[
bool
]
|
None
)
=
None
,
aggregate_geos
:
bool
=
True
,
batch_size
:
int
=
constants
.
DEFAULT_BATCH_SIZE
)
->
meridian
.
backend
.
Tensor
Calculates the cost per incremental KPI distribution for each channel.
The CPIK numerator is the total spend on the channel. The CPIK denominator is the change in expected KPI when one channel's spend is set to zero, leaving all other channels' spend unchanged.
If new_data=None
, this method calculates CPIK conditional on the values of
the paid media variables that the Meridian object was initialized with. The
user can also override this historical data through the new_data
argument.
For example,
new_data
=
DataTensors
(
media
=
new_media
,
frequency
=
new_frequency
)
If selected_geos
or selected_times
is specified, then the CPIK
numerator is the total spend during the selected geos and time periods. An
exception will be thrown if the spend of the InputData used to train the
model does not have geo and time dimensions. (If the new_data.media_spend
and new_data.rf_spend
arguments are used with different dimensions than
the InputData spend, then an exception will be thrown since this is a likely
user error.)
Note that CPIK is simply 1/ROI, where ROI is obtained from a call to the roi
method with use_kpi=True
.
use_posterior
True
then the posterior distribution is
calculated. Otherwise, the prior distribution is calculated.new_data
media
, media_spend
, reach
, frequency
, rf_spend
and revenue_per_kpi
data. If
provided, the cpik is calculated using the values of the tensors passed
in new_data
and the original values of all the remaining tensors. If None
, the ROI is calculated using the original values of all the
tensors. If any of the tensors in new_data
is provided with a
different number of time periods than in InputData
, then all tensors
must be provided with the same number of time periods.selected_geos
selected_times
new_data
args, if provided. By default, all time periods are
included.aggregate_geos
True
, the expected KPI is summed over all of
the regions.batch_size
batch_size
. The calculation will
generally be faster with larger batch_size
values.
(n_chains, n_draws, n_geos,
(n_media_channels + n_rf_channels))
. The n_geos
dimension is dropped if aggregate_geos=True
. expected_outcome
expected_outcome
(
use_posterior
:
bool
=
True
,
new_data
:
(
meridian
.
analysis
.
analyzer
.
DataTensors
|
None
)
=
None
,
selected_geos
:
(
Sequence
[
str
]
|
None
)
=
None
,
selected_times
:
(
Sequence
[
str
]
|
None
)
=
None
,
aggregate_geos
:
bool
=
True
,
aggregate_times
:
bool
=
True
,
inverse_transform_outcome
:
bool
=
True
,
use_kpi
:
bool
=
False
,
batch_size
:
int
=
constants
.
DEFAULT_BATCH_SIZE
)
->
meridian
.
backend
.
Tensor
Calculates either prior or posterior expected outcome.
This calculates E(Outcome|Media, RF, Organic media, Organic RF, Non-media
treatments, Controls)
for each posterior (or prior) parameter draw, where Outcome
refers to either revenue
if use_kpi=False
, or kpi
if use_kpi=True
. When revenue_per_kpi
is not defined, use_kpi
cannot
be False
.
If new_data=None
, this method calculates expected outcome conditional on
the values of the independent variables that the Meridian object was
initialized with. The user can also override this historical data through
the new_data
argument, as long as the new tensors' dimensions match. For
example,
new_data
=
DataTensors
(
reach
=
new_reach
,
frequency
=
new_frequency
)
In principle, expected outcome could be calculated with other time dimensions (for future predictions, for instance). However, this is not allowed with this method because of the additional complexities this introduces:
- Corresponding price (revenue per KPI) data would also be needed.
- If the model contains weekly effect parameters, then some method is needed to estimate or predict these effects for time periods outside of the training data window.
use_posterior
True
, then the expected outcome posterior
distribution is calculated. Otherwise, the prior distribution is
calculated.new_data
DataTensors
container with optional new tensors: media
, reach
, frequency
, organic_media
, organic_reach
, organic_frequency
, non_media_treatments
, controls
. If None
,
expected outcome is calculated conditional on the original values of the
data tensors that the Meridian object was initialized with. If new_data
argument is used, expected outcome is calculated conditional
on the values of the tensors passed in new_data
and on the original
values of the remaining unset tensors. For example, expected_outcome(new_data=DataTensors(reach=new_reach,
frequency=new_frequency))
calculates expected outcome conditional on
the original media
, organic_media
, organic_reach
, organic_frequency
, non_media_treatments
and controls
tensors and
on the new given values for reach
and frequency
tensors. The new
tensors' dimensions must match the dimensions of the corresponding
original tensors from input_data
.selected_geos
selected_times
InputData.time
. By default, all time periods are included.aggregate_geos
True
, the expected outcome is summed over
all regions.aggregate_times
True
, the expected outcome is summed over
all time periods.inverse_transform_outcome
True
, returns the expected
outcome in the original KPI or revenue (depending on what is passed to use_kpi
), as it was passed to InputData
. If False, returns the
outcome after transformation by KpiTransformer
, reflecting how its
represented within the model.use_kpi
use_kpi = True
, the expected KPI is calculated;
otherwise the expected revenue (kpi * revenue_per_kpi)
is calculated.
It is required that use_kpi = True
if revenue_per_kpi
is not defined
or if inverse_transform_outcome = False
.batch_size
batch_size
. The calculation will
generally be faster with larger batch_size
values.
use_kpi
argument) with dimensions (n_chains, n_draws, n_geos,
n_times)
. The n_geos
and n_times
dimensions is dropped if aggregate_geos=True
or aggregate_time=True
, respectively.
NotFittedModelError
sample_posterior()
(for use_posterior=True
)
or sample_prior()
(for use_posterior=False
) has not been called
prior to calling this method. expected_vs_actual_data
expected_vs_actual_data
(
aggregate_geos
:
bool
=
False
,
aggregate_times
:
bool
=
False
,
split_by_holdout_id
:
bool
=
False
,
non_media_baseline_values
:
(
Sequence
[
float
]
|
None
)
=
None
,
confidence_level
:
float
=
constants
.
DEFAULT_CONFIDENCE_LEVEL
)
->
xr
.
Dataset
Calculates the data for the expected versus actual outcome over time.
aggregate_geos
True
, the expected, baseline, and actual are
summed over all of the regions.aggregate_times
True
, the expected, baseline, and actual
are summed over all of the time periods.split_by_holdout_id
True
and holdout_id
exists, the data
is split into 'Train'
, 'Test'
, and 'All Data'
subsections.non_media_baseline_values
(n_non_media_channels,)
. Each element is a float which means that the
fixed value will be used as baseline for the given channel. It is
expected that they are scaled by population for the channels where model_spec.non_media_population_scaling_id
is True
. If None
, the model_spec.non_media_baseline_values
is used, which defaults to the
minimum value for each non_media treatment channel.confidence_level
0.9
.
filter_and_aggregate_geos_and_times
filter_and_aggregate_geos_and_times
(
tensor
:
meridian
.
backend
.
Tensor
,
selected_geos
:
(
Sequence
[
str
]
|
None
)
=
None
,
selected_times
:
(
Sequence
[
str
]
|
Sequence
[
bool
]
|
None
)
=
None
,
aggregate_geos
:
bool
=
True
,
aggregate_times
:
bool
=
True
,
flexible_time_dim
:
bool
=
False
,
has_media_dim
:
bool
=
True
)
->
meridian
.
backend
.
Tensor
Filters and/or aggregates geo and time dimensions of a tensor.
tensor
[..., n_geos, n_times]
or [..., n_geos,
n_times, n_channels]
, where n_channels
is the number of either media
channels, RF channels, all paid channels (media and RF), or all channels
(media, RF, non-media, organic media, organic RF).selected_geos
InputData.geo
.selected_times
InputData.time
or a boolean list with length equal to the time
dimension of the tensor. By default, all time periods are included.aggregate_geos
True
, the tensor is summed over all geos.aggregate_times
True
, the tensor is summed over all time
periods.flexible_time_dim
True
, the time dimension of the tensor is
not required to match the number of time periods in InputData.time
. In
this case, if using selected_times
, it must be a boolean list with
length equal to the time dimension of the tensor.has_media_dim
flexible_time_dim=True
. Otherwise,
this is assumed based on the tensor dimensions. If True
, the tensor is
assumed to have a media dimension following the time dimension. If False
, the last dimension of the tensor is assumed to be the time
dimension.
get_aggregated_impressions
get_aggregated_impressions
(
new_data
:
(
meridian
.
analysis
.
analyzer
.
DataTensors
|
None
)
=
None
,
selected_geos
:
(
Sequence
[
str
]
|
None
)
=
None
,
selected_times
:
(
Sequence
[
str
]
|
Sequence
[
bool
]
|
None
)
=
None
,
aggregate_geos
:
bool
=
True
,
aggregate_times
:
bool
=
True
,
optimal_frequency
:
(
Sequence
[
float
]
|
None
)
=
None
,
include_non_paid_channels
:
bool
=
True
)
->
meridian
.
backend
.
Tensor
Computes aggregated impressions values in the data across all channels.
new_data
DataTensors
object containing the new media
, reach
, frequency
, organic_media
, organic_reach
, organic_frequency
, and non_media_treatments
tensors. If new_data
argument is used, then the aggregated impressions are computed using the
values of the tensors passed in the new_data
argument and the original
values of all the remaining tensors. If None
, the existing tensors
from the Meridian object are used.selected_geos
selected_times
new_data
argument, if provided. By default, all
time periods are included.aggregate_geos
True
, the expected outcome is summed over
all of the regions.aggregate_times
True
, the expected outcome is summed over
all of the time periods.optimal_frequency
n_rf_channels
,
containing the optimal frequency per channel, that maximizes posterior
mean ROI. Default value is None
, and historical frequency is used for
the metrics calculation.include_non_paid_channels
True
, the organic media, organic
RF, and non-media channels are included in the aggregation.
(n_selected_geos, n_selected_times, n_channels)
(or (n_channels,)
if geos and times are aggregated) with aggregate
impression values per channel. get_aggregated_spend
get_aggregated_spend
(
new_data
:
(
meridian
.
analysis
.
analyzer
.
DataTensors
|
None
)
=
None
,
selected_times
:
(
Sequence
[
str
]
|
Sequence
[
bool
]
|
None
)
=
None
,
include_media
:
bool
=
True
,
include_rf
:
bool
=
True
)
->
xr
.
DataArray
Gets the aggregated spend based on the selected time.
new_data
DataTensors
object containing the new media
, media_spend
, reach
, frequency
, rf_spend
tensors. If None
, the
existing tensors from the Meridian object are used. If new_data
argument is used, then the aggregated spend is computed using the values
of the tensors passed in the new_data
argument and the original values
of all the remaining tensors. If any of the tensors in new_data
is
provided with a different number of time periods than in InputData
,
then all tensors must be provided with the same number of time periods.selected_times
include_media
include_rf
xr.DataArray
with the coordinate channel
and contains the data
variable spend
.
ValueError
include_media
and include_rf
are both False. get_historical_spend
get_historical_spend
(
selected_times
:
(
Sequence
[
str
]
|
None
)
=
None
,
include_media
:
bool
=
True
,
include_rf
:
bool
=
True
)
->
xr
.
DataArray
Deprecated. Gets the aggregated historical spend based on the time.
selected_times
include_media
include_rf
xr.DataArray
with the coordinate channel
and contains the data
variable spend
.
ValueError
include_media
and include_rf
are both False. get_rhat
get_rhat
()
->
Mapping
[
str
,
meridian
.
backend
.
Tensor
]
Computes the R-hat values for each parameter in the model.
NotFittedModelError
hill_curves
hill_curves
(
confidence_level
:
float
=
constants
.
DEFAULT_CONFIDENCE_LEVEL
,
n_bins
:
int
=
25
)
->
pd
.
DataFrame
Estimates Hill curve tables used for plotting each channel's curves.
confidence_level
0.9
.n_bins
25
.
pd.DataFrame
with columns: -
channel
:media
orrf
channel name. -
media_units
: Media (formedia
channels) or average frequency (forrf
channels) units. -
distribution
: Indication ofposterior
orprior
draw. -
ci_hi
: Upper bound of the credible interval of the value of the Hill function. -
ci_lo
: Lower bound of the credible interval of the value of the Hill function. -
mean
: Point-wise mean of the value of the Hill function per draw. -
channel_type
: Indication of amedia
orrf
channel. -
scaled_count_histogram
: Scaled count of media units or average frequencies within the bin. -
count_histogram
: Count value of media units or average frequencies within the bin. -
start_interval_histogram
: Media unit or average frequency starting point for a histogram bin. -
end_interval_histogram
: Media unit or average frequency ending point for a histogram bin.
incremental_outcome
incremental_outcome
(
use_posterior
:
bool
=
True
,
new_data
:
(
meridian
.
analysis
.
analyzer
.
DataTensors
|
None
)
=
None
,
non_media_baseline_values
:
(
Sequence
[
float
]
|
None
)
=
None
,
scaling_factor0
:
float
=
0.0
,
scaling_factor1
:
float
=
1.0
,
selected_geos
:
(
Sequence
[
str
]
|
None
)
=
None
,
selected_times
:
(
Sequence
[
str
]
|
Sequence
[
bool
]
|
None
)
=
None
,
media_selected_times
:
(
Sequence
[
str
]
|
Sequence
[
bool
]
|
None
)
=
None
,
aggregate_geos
:
bool
=
True
,
aggregate_times
:
bool
=
True
,
inverse_transform_outcome
:
bool
=
True
,
use_kpi
:
bool
=
False
,
by_reach
:
bool
=
True
,
include_non_paid_channels
:
bool
=
True
,
batch_size
:
int
=
constants
.
DEFAULT_BATCH_SIZE
)
->
meridian
.
backend
.
Tensor
Calculates either the posterior or prior incremental outcome.
This calculates the media outcome of each media channel for each posterior or prior parameter draw. Incremental outcome is defined as:
E(Outcome|Treatment_1, Controls)
minus E(Outcome|Treatment_0, Controls)
For paid & organic channels (without reach and frequency data), Treatment_1
means that media execution for a given channel is multiplied
by scaling_factor1
(1.0 by default) for the set of time periods specified
by media_selected_times
. Similarly, Treatment_0
means that media
execution is multiplied by scaling_factor0
(0.0 by default) for these time
periods.
For paid & organic channels with reach and frequency data, either reach or
frequency is held fixed while the other is scaled, depending on the by_reach
argument.
For non-media treatments, Treatment_1
means that the variable is set to
historical values. Treatment_0
means that the variable is set to its
baseline value for all geos and time periods. Note that the scaling factors
( scaling_factor0
and scaling_factor1
) are not applicable to non-media
treatments.
"Outcome" refers to either revenue
if use_kpi=False
, or kpi
if use_kpi=True
. When revenue_per_kpi
is not defined, use_kpi
cannot be
False.
If new_data=None
, this method computes incremental outcome using media
, reach
, frequency
, organic_media
, organic_reach
, organic_frequency
, non_media_treatments
and revenue_per_kpi
tensors that the Meridian
object was initialized with. This behavior can be overridden with the new_data
argument. For example, new_data=DataTensors(media=new_media)
calculates incremental outcome using the new_media
tensor and the original
values of reach
, frequency
, organic_media
, organic_reach
, organic_frequency
, non_media_treatments
and revenue_per_kpi
tensors.
The calculation in this method depends on two key assumptions made in the Meridian implementation:
- Additivity of media effects (no interactions).
- Additive changes on the model KPI scale correspond to additive changes on the original KPI scale. In other words, the intercept and control effects do not influence the media effects. This assumption currently holds because the outcome transformation only involves centering and scaling, for example, no log transformations.
use_posterior
True
, then the incremental outcome posterior
distribution is calculated. Otherwise, the prior distribution is
calculated.new_data
DataTensors
container with optional tensors: media
, reach
, frequency
, organic_media
, organic_reach
, organic_frequency
, non_media_treatments
and revenue_per_kpi
. If None
, the incremental outcome is calculated using the InputData
provided to the Meridian object. If new_data
is provided, the
incremental outcome is calculated using the new tensors in new_data
and the original values of the remaining tensors. For example, incremental_outcome(new_data=DataTensors(media=new_media)
computes the
incremental outcome using new_media
and the original values of reach
, frequency
, organic_media
, organic_reach
, organic_frequency
, non_media_treatments
and revenue_per_kpi
. If
any of the tensors in new_data
is provided with a different number of
time periods than in InputData
, then all tensors must be provided with
the same number of time periods.non_media_baseline_values
(n_non_media_channels,)
. Each element is a float which means that the
fixed value will be used as baseline for the given channel. It is
expected that they are scaled by population for the channels where model_spec.non_media_population_scaling_id
is True
. If None
, the model_spec.non_media_baseline_values
is used, which defaults to the
minimum value for each non_media treatment channel.scaling_factor0
media_selected_times
. Must be non-negative and less than scaling_factor1
.scaling_factor1
media_selected_times
. Must be
non-negative and greater than scaling_factor0
.selected_geos
selected_times
new_data
if time is modified in new_data
, or input_data.n_times
otherwise. The incremental outcome corresponds to incremental KPI
generated during the selected_times
arg by media executed during the media_selected_times
arg. Note that if use_kpi=False
, then selected_times
can only include the time periods that have revenue_per_kpi
input data. By default, all time periods are included
where revenue_per_kpi
data is available.media_selected_times
new_data
args, if provided.
If new_data
is provided, media_selected_times
can select any subset
of time periods in new_data
. If new_data
is not provided, media_selected_times
selects from InputData.time
. The incremental
outcome corresponds to incremental KPI generated during the selected_times
arg by treatment variables executed during the media_selected_times
arg. For each channel, the incremental outcome is
defined as the difference between expected KPI when treatment variables
execution is scaled by scaling_factor1
and scaling_factor0
during
these specified time periods. By default, the difference is between
treatment variables at historical execution levels, or as provided in new_data
, versus zero execution. Defaults to include all time periods.aggregate_geos
True
, then incremental outcome is summed
over all regions.aggregate_times
True
, then incremental outcome is summed
over all time periods.inverse_transform_outcome
True
, returns the expected
outcome in the original KPI or revenue (depending on what is passed to use_kpi
), as it was passed to InputData
. If False, returns the
outcome after transformation by KpiTransformer
, reflecting how its
represented within the model.use_kpi
use_kpi = True
, the expected KPI is calculated;
otherwise the expected revenue (kpi * revenue_per_kpi)
is calculated.
It is required that use_kpi = True
if revenue_per_kpi
data is not
available or if inverse_transform_outcome = False
.by_reach
True
, then the incremental outcome is calculated
by scaling the reach and holding the frequency constant. If False
,
then the incremental outcome is calculated by scaling the frequency and
holding the reach constant. Only used for channels with RF data.include_non_paid_channels
True
, then non-media treatments
and organic effects are included in the calculation. If False
, then
only the paid media and RF effects are included.batch_size
batch_size
. The calculation will
generally be faster with larger batch_size
values.
use_kpi
argument) with dimensions (n_chains, n_draws, n_geos,
n_times, n_channels)
. If include_non_paid_channels=True
, then n_channel
is the total number of media, RF, organic media, and organic
RF and non-media channels. If include_non_paid_channels=False
, then n_channels
is the total number of media and RF channels. The n_geos
and n_times
dimensions are dropped if aggregate_geos=True
or aggregate_times=True
, respectively.
NotFittedModelError
sample_posterior()
(for use_posterior=True
)
or sample_prior()
(for use_posterior=False
) has not been called
prior to calling this method.ValueError
new_data
argument contains tensors with modified time
dimension and not all treatment variables are provided in new_data
with matching time dimensions. marginal_roi
marginal_roi
(
incremental_increase
:
float
=
0.01
,
use_posterior
:
bool
=
True
,
new_data
:
(
meridian
.
analysis
.
analyzer
.
DataTensors
|
None
)
=
None
,
selected_geos
:
(
Sequence
[
str
]
|
None
)
=
None
,
selected_times
:
(
Sequence
[
str
]
|
Sequence
[
bool
]
|
None
)
=
None
,
aggregate_geos
:
bool
=
True
,
by_reach
:
bool
=
True
,
use_kpi
:
bool
=
False
,
batch_size
:
int
=
constants
.
DEFAULT_BATCH_SIZE
)
->
meridian
.
backend
.
Tensor
Calculates the marginal ROI prior or posterior distribution.
The marginal ROI (mROI) numerator is the change in expected outcome ( kpi
or kpi * revenue_per_kpi
) when one channel's spend is increased by a small
fraction. The mROI denominator is the corresponding small fraction of the
channel's total spend.
If new_data=None
, this method calculates marginal ROI conditional on the
values of the paid media variables that the Meridian object was initialized
with. The user can also override this historical data through the new_data
argument. For example,
new_data
=
DataTensors
(
media
=
new_media
,
frequency
=
new_frequency
)
If selected_geos
or selected_times
is specified, then the mROI
denominator is based on the total spend during the selected geos and time
periods. An exception will be thrown if the spend of the InputData used to
train the model does not have geo and time dimensions. (If the new_data.media_spend
and new_data.rf_spend
arguments are used with
different dimensions than the InputData spend, then an exception will be
thrown since this is a likely user error.)
incremental_increase
True
.use_posterior
True
then the posterior distribution is calculated.
Otherwise, the prior distribution is calculated.new_data
media
, media_spend
, reach
, frequency
, rf_spend
and revenue_per_kpi
data. If
provided, the marginal ROI is calculated using the values of the tensors
passed in new_data
and the original values of all the remaining
tensors. If None
, the marginal ROI is calculated using the original
values of all the tensors. If any of the tensors in new_data
is
provided with a different number of time periods than in InputData
,
then all tensors must be provided with the same number of time periods.selected_geos
selected_times
new_data
args, if provided. By default, all time periods are
included.aggregate_geos
True
, the expected revenue is summed over all of the
regions.by_reach
True
, returns
the mROI by reach for a given fixed frequency. If False
, returns the
mROI by frequency for a given fixed reach.use_kpi
False
, then revenue is used to calculate the mROI numerator.
Otherwise, uses KPI to calculate the mROI numerator.batch_size
batch_size
. The calculation will generally be faster with
larger batch_size
values.
(n_chains, n_draws, n_geos,
(n_media_channels + n_rf_channels))
. The n_geos
dimension is dropped if aggregate_geos=True
. negative_baseline_probability
negative_baseline_probability
(
non_media_baseline_values
:
(
Sequence
[
float
]
|
None
)
=
None
,
use_posterior
:
bool
=
True
,
selected_geos
:
(
Sequence
[
str
]
|
None
)
=
None
,
selected_times
:
(
Sequence
[
str
]
|
None
)
=
None
,
use_kpi
:
bool
=
False
,
batch_size
:
int
=
constants
.
DEFAULT_BATCH_SIZE
)
->
np
.
floating
Calculates either prior or posterior negative baseline probability.
This calculates either the prior or posterior probability that the baseline, aggregated over the supplied time window, is negative.
The baseline is calculated by computing expected_outcome
with the
following assumptions:
1) media
is set to all zeros,
2) reach
is set to all zeros,
3) organic_media
is set to all zeros,
4) organic_reach
is set to all zeros,
5) non_media_treatments
is set to the counterfactual values according
to the non_media_baseline_values
argument,
6) controls
are set to historical values.
non_media_baseline_values
(n_non_media_channels,)
. Each element is a float denoting a fixed
value that will be used as the baseline for the given channel. It is
expected that they are scaled by population for the channels where model_spec.non_media_population_scaling_id
is True
. If None
, the model_spec.non_media_baseline_values
is used, which defaults to the
minimum value for each non_media treatment channel.use_posterior
True
, then the expected outcome posterior
distribution is calculated. Otherwise, the prior distribution is
calculated.selected_geos
selected_times
InputData.time
. By default, all time periods are included.use_kpi
use_kpi = True
, the expected KPI is calculated;
otherwise the expected revenue (kpi * revenue_per_kpi)
is calculated.
It is required that use_kpi = True
if revenue_per_kpi
is not defined
or if inverse_transform_outcome = False
.batch_size
batch_size
. The calculation will
generally be faster with larger batch_size
values.
NotFittedModelError
sample_posterior()
(for use_posterior=True
)
or sample_prior()
(for use_posterior=False
) has not been called
prior to calling this method. optimal_freq
optimal_freq
(
new_data
:
(
meridian
.
analysis
.
analyzer
.
DataTensors
|
None
)
=
None
,
max_frequency
:
(
float
|
None
)
=
None
,
freq_grid
:
(
Sequence
[
float
]
|
None
)
=
None
,
use_posterior
:
bool
=
True
,
use_kpi
:
bool
=
False
,
selected_geos
:
(
Sequence
[
str
]
|
None
)
=
None
,
selected_times
:
(
Sequence
[
str
]
|
Sequence
[
bool
]
|
None
)
=
None
,
confidence_level
:
float
=
constants
.
DEFAULT_CONFIDENCE_LEVEL
)
->
xr
.
Dataset
Calculates the optimal frequency that maximizes posterior mean ROI.
For this optimization, historical spend is used and fixed, and frequency is restricted to be constant across all geographic regions and time periods. Reach is calculated for each geographic area and time period such that the number of impressions remains unchanged as frequency varies. Meridian solves for the frequency at which posterior mean ROI is optimized.
If new_data=None
, this method calculates the opptimal frequency on the
values of the paid RF variables that the Meridian object was initialized
with. The user can override this historical data through the new_data
argument. For example,
new_data
=
DataTensors
(
reach
=
new_reach
,
frequency
=
new_frequency
)
new_data
DataTensors
object containing rf_impressions
, rf_spend
, and revenue_per_kpi
. If provided, the optimal frequency is
calculated using the values of the tensors passed in new_data
and the
original values of all the remaining tensors. If None
, the historical
data used to initialize the Meridian object is used. If any of the
tensors in new_data
is provided with a different number of time
periods than in InputData
, then all tensors must be provided with the
same number of time periods.max_frequency
None
, the maximum frequency value is calculated from the
historic frequency (maximum value of Meridian.input_data, not new_data
). If freq_grid
is provided, this argument has no effect.freq_grid
1.0
to the maximum frequency in increments of 0.1
.use_posterior
True
, posterior optimal frequencies are
generated. If False
, prior optimal frequencies are generated.use_kpi
True
, the counterfactual metrics are calculated
using KPI. If False
, the counterfactual metrics are calculated using
revenue.selected_geos
selected_times
new_data
if time is modified in new_data
, or input_data.n_times
otherwise. By default, all time periods are included.confidence_level
- Coordinates:
frequency
,rf_channel
,metric
(mean
,median
,ci_lo
,ci_hi
). - Data variables:
-
optimal_frequency
: The frequency that optimizes the posterior mean of ROI. -
roi
: The ROI for each frequency value infreq_grid
. -
optimized_incremental_outcome
: The incremental outcome based on the optimal frequency. -
optimized_effectiveness
: The effectiveness based on the optimal frequency. -
optimized_roi
: The ROI based on the optimal frequency. -
optimized_mroi_by_reach
: The marginal ROI with a small change in reach and fixed frequency at the optimal frequency. -
optimized_mroi_by_frequency
: The marginal ROI with a small change around the optimal frequency and fixed reach. -
optimized_cpik
: The CPIK based on the optimal frequency.
-
NotFittedModelError
sample_posterior()
(for use_posterior=True
)
or sample_prior()
(for use_posterior=False
) has not been called
prior to calling this method.ValueError
predictive_accuracy
predictive_accuracy
(
selected_geos
:
(
Sequence
[
str
]
|
None
)
=
None
,
selected_times
:
(
Sequence
[
str
]
|
None
)
=
None
,
batch_size
:
int
=
constants
.
DEFAULT_BATCH_SIZE
)
->
xr
.
Dataset
Calculates R-Squared
, MAPE
, and wMAPE
goodness of fit metrics.
R-Squared
, MAPE
(mean absolute percentage error), and wMAPE
(weighted
absolute percentage error) are calculated on the revenue scale
( KPI * revenue_per_kpi
) when revenue_per_kpi
is specified, or the KPI
scale when revenue_per_kpi = None
. This is the same scale as what is used
in the ROI numerator (incremental outcome).
Prediction errors in wMAPE
are weighted by the actual revenue
( KPI * revenue_per_kpi
) when revenue_per_kpi
is specified, or weighted
by the KPI scale when revenue_per_kpi = None
. This means that percentage
errors when revenue is high are weighted more heavily than errors when
revenue is low.
R-Squared
, MAPE
and wMAPE
are calculated both at the model-level (one
observation per geo and time period) and at the national-level (aggregating
KPI or revenue outcome across geos so there is one observation per time
period).
R-Squared
, MAPE
, and wMAPE
are calculated for the full sample. If the
model object has any holdout observations, then R-squared
, MAPE
, and wMAPE
are also calculated for the Train
and Test
subsets.
selected_geos
selected_times
batch_size
batch_size
is 100
. The calculation is run in
batches to avoid memory exhaustion. If a memory error occurs, try
reducing batch_size
. The calculation will generally be faster with
larger batch_size
values.
R_Squared
, MAPE
, and wMAPE
values, with coordinates metric
, geo_granularity
, evaluation_set
,
and accompanying data variable value
. If holdout_id
exists, the data
is split into 'Train'
, 'Test'
, and 'All Data'
subsections, and the
three metrics are computed for each. response_curves
response_curves
(
spend_multipliers
:
(
list
[
float
]
|
None
)
=
None
,
use_posterior
:
bool
=
True
,
selected_geos
:
(
Sequence
[
str
]
|
None
)
=
None
,
selected_times
:
(
Sequence
[
str
]
|
None
)
=
None
,
by_reach
:
bool
=
True
,
use_optimal_frequency
:
bool
=
False
,
use_kpi
:
bool
=
False
,
confidence_level
:
float
=
constants
.
DEFAULT_CONFIDENCE_LEVEL
,
batch_size
:
int
=
constants
.
DEFAULT_BATCH_SIZE
)
->
xr
.
Dataset
Method to generate a response curves xarray.Dataset.
Response curves are calculated in aggregate across geos and time periods, assuming the historical flighting pattern across geos and time periods for each media channel.
A list of multipliers is applied to each media channel's total historical
spend within selected_geos
and selected_times
to obtain the x-axis
values. The y-axis values are the incremental outcome generated by each
channel within selected_geos
and selected_times
under the counterfactual
where media units in each geo and time period are scaled by the
corresponding multiplier. (Media units for time periods prior to selected_times
are also scaled by the multiplier.)
spend_multipliers
use_posterior
True
, posterior response curves are
generated. If False
, prior response curves are generated.selected_geos
selected_times
by_reach
True
, plots
the response curve by reach. If False
, plots the response curve by
frequency.use_optimal_frequency
True
, uses the optimal frequency to plot the
response curves. Defaults to False
.use_kpi
False
.confidence_level
batch_size
batch_size
. The calculation will
generally be faster with larger batch_size
values.
xarray.Dataset
containing the data needed to visualize response
curves. rhat_summary
rhat_summary
(
bad_rhat_threshold
:
float
=
1.2
)
->
pd
.
DataFrame
Computes a summary of the R-hat values for each parameter in the model.
Summarizes the Gelman & Rubin (1992) potential scale reduction for chain convergence, commonly referred to as R-hat. It is a convergence diagnostic measure that measures the degree to which variance (of the means) between chains exceeds what you would expect if the chains were identically distributed. Values close to 1.0 indicate convergence. R-hat < 1.2 indicates approximate convergence and is a reasonable threshold for many problems (Brooks & Gelman, 1998).
bad_rhat_threshold
-
n_params
: The number of respective parameters in the model. -
avg_rhat
: The average R-hat value for the respective parameter. -
n_params
: The number of respective parameters in the model. -
avg_rhat
: The average R-hat value for the respective parameter. -
max_rhat
: The maximum R-hat value for the respective parameter. -
percent_bad_rhat
: The percentage of R-hat values for the respective parameter that are greater thanbad_rhat_threshold
. -
row_idx_bad_rhat
: The row indices of the R-hat values that are greater thanbad_rhat_threshold
. -
col_idx_bad_rhat
: The column indices of the R-hat values that are greater thanbad_rhat_threshold
.
NotFittedModelError
self.sample_posterior()
is not called before
calling this method.ValueError
1
or 2
. roi
roi
(
use_posterior
:
bool
=
True
,
new_data
:
(
meridian
.
analysis
.
analyzer
.
DataTensors
|
None
)
=
None
,
selected_geos
:
(
Sequence
[
str
]
|
None
)
=
None
,
selected_times
:
(
Sequence
[
str
]
|
Sequence
[
bool
]
|
None
)
=
None
,
aggregate_geos
:
bool
=
True
,
use_kpi
:
bool
=
False
,
batch_size
:
int
=
constants
.
DEFAULT_BATCH_SIZE
)
->
meridian
.
backend
.
Tensor
Calculates ROI prior or posterior distribution for each media channel.
The ROI numerator is the change in expected outcome ( kpi
or kpi *
revenue_per_kpi
) when one channel's spend is set to zero, leaving all other
channels' spend unchanged. The ROI denominator is the total spend of the
channel.
If new_data=None
, this method calculates ROI conditional on the values of
the paid media variables that the Meridian object was initialized with. The
user can also override this historical data through the new_data
argument.
For example,
new_data
=
DataTensors
(
media
=
new_media
,
frequency
=
new_frequency
)
If selected_geos
or selected_times
is specified, then the ROI
denominator is the total spend during the selected geos and time periods. An
exception will be thrown if the spend of the InputData used to train the
model does not have geo and time dimensions. (If the new_data.media_spend
and new_data.rf_spend
arguments are used with different dimensions than
the InputData spend, then an exception will be thrown since this is a likely
user error.)
use_posterior
True
, then the posterior distribution is
calculated. Otherwise, the prior distribution is calculated.new_data
media
, media_spend
, reach
, frequency
, and rf_spend
, and revenue_per_kpi
data. If
provided, the ROI is calculated using the values of the tensors passed
in new_data
and the original values of all the remaining tensors. If None
, the ROI is calculated using the original values of all the
tensors. If any of the tensors in new_data
is provided with a
different number of time periods than in InputData
, then all tensors
must be provided with the same number of time periods.selected_geos
selected_times
new_data
args, if provided. By default, all time periods are
included.aggregate_geos
True
, the expected revenue is summed over
all of the regions.use_kpi
False
, then revenue is used to calculate the ROI numerator.
Otherwise, uses KPI to calculate the ROI numerator.batch_size
batch_size
. The calculation will
generally be faster with larger batch_size
values.
(n_chains, n_draws, n_geos,
(n_media_channels + n_rf_channels))
. The n_geos
dimension is dropped if aggregate_geos=True
. summary_metrics
summary_metrics
(
new_data
:
(
meridian
.
analysis
.
analyzer
.
DataTensors
|
None
)
=
None
,
marginal_roi_by_reach
:
bool
=
True
,
marginal_roi_incremental_increase
:
float
=
0.01
,
selected_geos
:
(
Sequence
[
str
]
|
None
)
=
None
,
selected_times
:
(
Sequence
[
str
]
|
Sequence
[
bool
]
|
None
)
=
None
,
aggregate_geos
:
bool
=
True
,
aggregate_times
:
bool
=
True
,
optimal_frequency
:
(
Sequence
[
float
]
|
None
)
=
None
,
use_kpi
:
bool
=
False
,
confidence_level
:
float
=
constants
.
DEFAULT_CONFIDENCE_LEVEL
,
batch_size
:
int
=
constants
.
DEFAULT_BATCH_SIZE
,
include_non_paid_channels
:
bool
=
False
,
non_media_baseline_values
:
(
Sequence
[
float
]
|
None
)
=
None
)
->
xr
.
Dataset
Returns summary metrics.
If new_data=None
, this method calculates all the metrics conditional on
the values of the data variables that the Meridian object was initialized
with. The user can also override this historical data through the new_data
argument. For example, to override the media, frequency, and non-media
treatments data variables, the user can pass the following new_data
argument:
new_data
=
DataTensors
(
media
=
new_media
,
frequency
=
new_frequency
,
non_media_treatments
=
new_non_media_treatments
)
Note that if new_data
is provided with a different number of time periods
than in InputData
, pct_of_contribution
is not defined because expected_outcome()
is not defined for new time periods.
Note that mroi
and effectiveness
metrics are not defined ( math.nan
)
for the aggregate "All Paid Channels"
channel dimension.
new_data
DataTensors
object with optional new tensors: media
, media_spend
, reach
, frequency
, rf_spend
, organic_media
, organic_reach
, organic_frequency
, non_media_treatments
, controls
, revenue_per_kpi
. If provided, the
summary metrics are calculated using the values of the tensors passed in new_data
and the original values of all the remaining tensors. If None
, the summary metrics are calculated using the original values of
all the tensors. If new_data
is provided with a different number of
time periods than in InputData
, then all tensors, except controls
,
must have the same number of time periods.marginal_roi_by_reach
True
, the
assumption is that the next dollar spent only impacts reach, holding
frequency constant. If this argument is False
, the assumption is that
the next dollar spent only impacts frequency, holding reach constant.
Used only when include_non_paid_channels
is False
.marginal_roi_incremental_increase
include_non_paid_channels
is False
.selected_geos
selected_times
new_data
argument, if provided. By default, all
time periods are included.aggregate_geos
True
, the expected outcome is summed over
all of the regions.aggregate_times
True
, the expected outcome is summed over
all of the time periods. Note that if False
, ROI, mROI, Effectiveness,
and CPIK are not reported because they do not have a clear
interpretation by time period.optimal_frequency
n_rf_channels
,
containing the optimal frequency per channel, that maximizes posterior
mean ROI. Default value is None
, and historical frequency is used for
the metrics calculation.use_kpi
True
, the summary metrics are calculated using KPI.
If False
, the metrics are calculated using revenue.confidence_level
batch_size
batch_size
. The calculation will
generally be faster with larger batch_size
values.include_non_paid_channels
True
, non-paid channels (organic
media, organic reach and frequency, and non-media treatments) are
included in the summary but only the metrics independent of spend are
reported. If False
, only the paid channels (media, reach and
frequency) are included but the summary contains also the metrics
dependent on spend. Default: False
.non_media_baseline_values
(n_non_media_channels,)
. Each element is a float which means that the
fixed value will be used as baseline for the given channel. It is
expected that they are scaled by population for the channels where model_spec.non_media_population_scaling_id
is True
. If None
, the model_spec.non_media_baseline_values
is used, which defaults to the
minimum value for each non_media treatment channel.
xr.Dataset
with coordinates: channel
, metric
( mean
, median
, ci_low
, ci_high
), distribution
(prior, posterior) and contains the
following non-paid data variables: incremental_outcome
, pct_of_contribution
, effectiveness
, and the following paid
data variables: impressions
, pct_of_impressions
, spend
, pct_of_spend
, CPM
, roi
, mroi
, cpik
. The paid data variables are
only included when include_non_paid_channels
is False
. Note that roi
, mroi
, cpik
, and effectiveness
metrics are not reported
when aggregate_times=False
because they do not have a clear
interpretation by time period.