meridian.analysis.analyzer.Analyzer

Runs calculations to analyze the raw data after fitting the model.

Methods

adstock_decay

View source

Calculates adstock decay for paid media, RF, and organic media channels.

Args

confidence_level
Confidence level for prior and posterior credible intervals, represented as a value between zero and one.

Returns
Pandas DataFrame containing the channel , time_units , distribution , ci_hi , ci_lo , and mean for the Adstock function.

baseline_summary_metrics

View source

Returns baseline summary metrics.

Args

selected_geos
Optional list containing a subset of geos to include. By default, all geos are included.
selected_times
Optional list containing a subset of times to include. By default, all time periods are included.
aggregate_geos
Boolean. If True , the expected outcome is summed over all of the regions.
aggregate_times
Boolean. If True , the expected outcome is summed over all of the time periods.
non_media_baseline_values
Optional list of shape (n_non_media_channels,) . Each element is a float which means that the fixed value will be used as baseline for the given channel. It is expected that they are scaled by population for the channels where model_spec.non_media_population_scaling_id is True . If None , the model_spec.non_media_baseline_values is used, which defaults to the minimum value for each non_media treatment channel.
confidence_level
Confidence level for media summary metrics credible intervals, represented as a value between zero and one.
batch_size
Integer representing the maximum draws per chain in each batch. The calculation is run in batches to avoid memory exhaustion. If a memory error occurs, try reducing batch_size . The calculation will generally be faster with larger batch_size values.

Returns
An xr.Dataset with coordinates: metric ( mean , median , ci_low , ci_high ), distribution (prior, posterior) and contains the following data variables: baseline_outcome , pct_of_contribution .

compute_incremental_outcome_aggregate

View source

Aggregates the incremental outcome of the media channels.

Args

use_posterior
Boolean. If True , then the incremental outcome posterior distribution is calculated. Otherwise, the prior distribution is calculated.
new_data
Optional DataTensors container with optional tensors: media , reach , frequency , organic_media , organic_reach , organic_frequency , non_media_treatments and revenue_per_kpi . If None , the incremental outcome is calculated using the InputData provided to the Meridian object. If new_data is provided, the incremental outcome is calculated using the new tensors in new_data and the original values of the remaining tensors. For example, compute_incremental_outcome_aggregate(new_data=DataTensors(media=new_media)) computes the incremental outcome using new_media and the original values of reach , frequency , organic_media , organic_reach , organic_frequency , non_media_treatments and revenue_per_kpi . If any of the tensors in new_data is provided with a different number of time periods than in InputData , then all tensors must be provided with the same number of time periods.
use_kpi
Boolean. If True , the summary metrics are calculated using KPI. If False , the metrics are calculated using revenue.
include_non_paid_channels
Boolean. If True , then non-media treatments and organic effects are included in the calculation. If False , then only the paid media and RF effects are included.
non_media_baseline_values
Optional list of shape (n_non_media_channels,) . Each element is a float which means that the fixed value will be used as baseline for the given channel. It is expected that they are scaled by population for the channels where model_spec.non_media_population_scaling_id is True . If None , the model_spec.non_media_baseline_values is used, which defaults to the minimum value for each non_media treatment channel.
**kwargs
kwargs to pass to incremental_outcome , which could contain selected_geos, selected_times, aggregate_geos, aggregate_times, batch_size.

Returns
A Tensor with the same dimensions as incremental_outcome except the size of the channel dimension is incremented by one, with the new component at the end containing the total incremental outcome of all channels.

cpik

View source

Calculates the cost per incremental KPI distribution for each channel.

The CPIK numerator is the total spend on the channel. The CPIK denominator is the change in expected KPI when one channel's spend is set to zero, leaving all other channels' spend unchanged.

If new_data=None , this method calculates CPIK conditional on the values of the paid media variables that the Meridian object was initialized with. The user can also override this historical data through the new_data argument. For example,

  new_data 
 = 
 DataTensors 
 ( 
 media 
 = 
 new_media 
 , 
 frequency 
 = 
 new_frequency 
 ) 
 

If selected_geos or selected_times is specified, then the CPIK numerator is the total spend during the selected geos and time periods. An exception will be thrown if the spend of the InputData used to train the model does not have geo and time dimensions. (If the new_data.media_spend and new_data.rf_spend arguments are used with different dimensions than the InputData spend, then an exception will be thrown since this is a likely user error.)

Note that CPIK is simply 1/ROI, where ROI is obtained from a call to the roi method with use_kpi=True .

Args

use_posterior
Boolean. If True then the posterior distribution is calculated. Otherwise, the prior distribution is calculated.
new_data
Optional. DataTensors containing media , media_spend , reach , frequency , rf_spend and revenue_per_kpi data. If provided, the cpik is calculated using the values of the tensors passed in new_data and the original values of all the remaining tensors. If None , the ROI is calculated using the original values of all the tensors. If any of the tensors in new_data is provided with a different number of time periods than in InputData , then all tensors must be provided with the same number of time periods.
selected_geos
Optional. Contains a subset of geos to include. By default, all geos are included.
selected_times
Optional list containing either a subset of dates to include or booleans with length equal to the number of time periods in the new_data args, if provided. By default, all time periods are included.
aggregate_geos
Boolean. If True , the expected KPI is summed over all of the regions.
batch_size
Integer representing the maximum draws per chain in each batch. The calculation is run in batches to avoid memory exhaustion. If a memory error occurs, try reducing batch_size . The calculation will generally be faster with larger batch_size values.

Returns
Tensor of CPIK values with dimensions (n_chains, n_draws, n_geos, (n_media_channels + n_rf_channels)) . The n_geos dimension is dropped if aggregate_geos=True .

expected_outcome

View source

Calculates either prior or posterior expected outcome.

This calculates E(Outcome|Media, RF, Organic media, Organic RF, Non-media treatments, Controls) for each posterior (or prior) parameter draw, where Outcome refers to either revenue if use_kpi=False , or kpi if use_kpi=True . When revenue_per_kpi is not defined, use_kpi cannot be False .

If new_data=None , this method calculates expected outcome conditional on the values of the independent variables that the Meridian object was initialized with. The user can also override this historical data through the new_data argument, as long as the new tensors' dimensions match. For example,

  new_data 
 = 
 DataTensors 
 ( 
 reach 
 = 
 new_reach 
 , 
 frequency 
 = 
 new_frequency 
 ) 
 

In principle, expected outcome could be calculated with other time dimensions (for future predictions, for instance). However, this is not allowed with this method because of the additional complexities this introduces:

  1. Corresponding price (revenue per KPI) data would also be needed.
  2. If the model contains weekly effect parameters, then some method is needed to estimate or predict these effects for time periods outside of the training data window.

Args

use_posterior
Boolean. If True , then the expected outcome posterior distribution is calculated. Otherwise, the prior distribution is calculated.
new_data
An optional DataTensors container with optional new tensors: media , reach , frequency , organic_media , organic_reach , organic_frequency , non_media_treatments , controls . If None , expected outcome is calculated conditional on the original values of the data tensors that the Meridian object was initialized with. If new_data argument is used, expected outcome is calculated conditional on the values of the tensors passed in new_data and on the original values of the remaining unset tensors. For example, expected_outcome(new_data=DataTensors(reach=new_reach, frequency=new_frequency)) calculates expected outcome conditional on the original media , organic_media , organic_reach , organic_frequency , non_media_treatments and controls tensors and on the new given values for reach and frequency tensors. The new tensors' dimensions must match the dimensions of the corresponding original tensors from input_data .
selected_geos
Optional list of containing a subset of geos to include. By default, all geos are included.
selected_times
Optional list of containing a subset of dates to include. The values accepted here must match time dimension coordinates from InputData.time . By default, all time periods are included.
aggregate_geos
Boolean. If True , the expected outcome is summed over all regions.
aggregate_times
Boolean. If True , the expected outcome is summed over all time periods.
inverse_transform_outcome
Boolean. If True , returns the expected outcome in the original KPI or revenue (depending on what is passed to use_kpi ), as it was passed to InputData . If False, returns the outcome after transformation by KpiTransformer , reflecting how its represented within the model.
use_kpi
Boolean. If use_kpi = True , the expected KPI is calculated; otherwise the expected revenue (kpi * revenue_per_kpi) is calculated. It is required that use_kpi = True if revenue_per_kpi is not defined or if inverse_transform_outcome = False .
batch_size
Integer representing the maximum draws per chain in each batch. The calculation is run in batches to avoid memory exhaustion. If a memory error occurs, try reducing batch_size . The calculation will generally be faster with larger batch_size values.

Returns
Tensor of expected outcome (either KPI or revenue, depending on the use_kpi argument) with dimensions (n_chains, n_draws, n_geos, n_times) . The n_geos and n_times dimensions is dropped if aggregate_geos=True or aggregate_time=True , respectively.

Raises

NotFittedModelError
if sample_posterior() (for use_posterior=True ) or sample_prior() (for use_posterior=False ) has not been called prior to calling this method.

expected_vs_actual_data

View source

Calculates the data for the expected versus actual outcome over time.

Args

aggregate_geos
Boolean. If True , the expected, baseline, and actual are summed over all of the regions.
aggregate_times
Boolean. If True , the expected, baseline, and actual are summed over all of the time periods.
split_by_holdout_id
Boolean. If True and holdout_id exists, the data is split into 'Train' , 'Test' , and 'All Data' subsections.
non_media_baseline_values
Optional list of shape (n_non_media_channels,) . Each element is a float which means that the fixed value will be used as baseline for the given channel. It is expected that they are scaled by population for the channels where model_spec.non_media_population_scaling_id is True . If None , the model_spec.non_media_baseline_values is used, which defaults to the minimum value for each non_media treatment channel.
confidence_level
Confidence level for expected outcome credible intervals, represented as a value between zero and one. Default: 0.9 .

Returns
A dataset with the expected, baseline, and actual outcome metrics.

filter_and_aggregate_geos_and_times

View source

Filters and/or aggregates geo and time dimensions of a tensor.

Args

tensor
Tensor with dimensions [..., n_geos, n_times] or [..., n_geos, n_times, n_channels] , where n_channels is the number of either media channels, RF channels, all paid channels (media and RF), or all channels (media, RF, non-media, organic media, organic RF).
selected_geos
Optional list containing a subset of geos to include. By default, all geos are included. The selected geos should match those in InputData.geo .
selected_times
Optional list of times to include. This can either be a string list containing a subset of time dimension coordinates from InputData.time or a boolean list with length equal to the time dimension of the tensor. By default, all time periods are included.
aggregate_geos
Boolean. If True , the tensor is summed over all geos.
aggregate_times
Boolean. If True , the tensor is summed over all time periods.
flexible_time_dim
Boolean. If True , the time dimension of the tensor is not required to match the number of time periods in InputData.time . In this case, if using selected_times , it must be a boolean list with length equal to the time dimension of the tensor.
has_media_dim
Boolean. Only used if flexible_time_dim=True . Otherwise, this is assumed based on the tensor dimensions. If True , the tensor is assumed to have a media dimension following the time dimension. If False , the last dimension of the tensor is assumed to be the time dimension.

Returns
A tensor with filtered and/or aggregated geo and time dimensions.

get_aggregated_impressions

View source

Computes aggregated impressions values in the data across all channels.

Args

new_data
An optional DataTensors object containing the new media , reach , frequency , organic_media , organic_reach , organic_frequency , and non_media_treatments tensors. If new_data argument is used, then the aggregated impressions are computed using the values of the tensors passed in the new_data argument and the original values of all the remaining tensors. If None , the existing tensors from the Meridian object are used.
selected_geos
Optional list containing a subset of geos to include. By default, all geos are included.
selected_times
Optional list containing either a subset of dates to include or booleans with length equal to the number of time periods in the tensors in the new_data argument, if provided. By default, all time periods are included.
aggregate_geos
Boolean. If True , the expected outcome is summed over all of the regions.
aggregate_times
Boolean. If True , the expected outcome is summed over all of the time periods.
optimal_frequency
An optional list with dimension n_rf_channels , containing the optimal frequency per channel, that maximizes posterior mean ROI. Default value is None , and historical frequency is used for the metrics calculation.
include_non_paid_channels
Boolean. If True , the organic media, organic RF, and non-media channels are included in the aggregation.

Returns
A tensor with the shape (n_selected_geos, n_selected_times, n_channels) (or (n_channels,) if geos and times are aggregated) with aggregate impression values per channel.

get_aggregated_spend

View source

Gets the aggregated spend based on the selected time.

Args

new_data
An optional DataTensors object containing the new media , media_spend , reach , frequency , rf_spend tensors. If None , the existing tensors from the Meridian object are used. If new_data argument is used, then the aggregated spend is computed using the values of the tensors passed in the new_data argument and the original values of all the remaining tensors. If any of the tensors in new_data is provided with a different number of time periods than in InputData , then all tensors must be provided with the same number of time periods.
selected_times
Optional list containing either a subset of dates to include or booleans with length equal to the number of time periods in KPI data. By default, all time periods are included.
include_media
Whether to include spends for paid media channels that do not have R&F data.
include_rf
Whether to include spends for paid media channels with R&F data.

Returns
An xr.DataArray with the coordinate channel and contains the data variable spend .

Raises

ValueError
A ValueError is raised when include_media and include_rf are both False.

get_historical_spend

View source

Deprecated. Gets the aggregated historical spend based on the time.

Args

selected_times
The time period to get the historical spends. If None, the historical spends will be aggregated over all time points.
include_media
Whether to include spends for paid media channels that do not have R&F data.
include_rf
Whether to include spends for paid media channels with R&F data.

Returns
An xr.DataArray with the coordinate channel and contains the data variable spend .

Raises

ValueError
A ValueError is raised when include_media and include_rf are both False.

get_rhat

View source

Computes the R-hat values for each parameter in the model.

Returns
A dictionary of r-hat values where each parameter is a key and values are r-hats corresponding to the parameter.

Raises

NotFittedModelError
If self.sample_posterior() is not called before calling this method.

hill_curves

View source

Estimates Hill curve tables used for plotting each channel's curves.

Args

confidence_level
Confidence level for prior and posterior credible intervals, represented as a value between zero and one. Default is 0.9 .
n_bins
Number of equal-width bins to include in the histogram for the plotting. Default is 25 .

Returns
Hill curves pd.DataFrame with columns:
  • channel : media or rf channel name.
  • media_units : Media (for media channels) or average frequency (for rf channels) units.
  • distribution : Indication of posterior or prior draw.
  • ci_hi : Upper bound of the credible interval of the value of the Hill function.
  • ci_lo : Lower bound of the credible interval of the value of the Hill function.
  • mean : Point-wise mean of the value of the Hill function per draw.
  • channel_type : Indication of a media or rf channel.
  • scaled_count_histogram : Scaled count of media units or average frequencies within the bin.
  • count_histogram : Count value of media units or average frequencies within the bin.
  • start_interval_histogram : Media unit or average frequency starting point for a histogram bin.
  • end_interval_histogram : Media unit or average frequency ending point for a histogram bin.

incremental_outcome

View source

Calculates either the posterior or prior incremental outcome.

This calculates the media outcome of each media channel for each posterior or prior parameter draw. Incremental outcome is defined as:

E(Outcome|Treatment_1, Controls) minus E(Outcome|Treatment_0, Controls)

For paid & organic channels (without reach and frequency data), Treatment_1 means that media execution for a given channel is multiplied by scaling_factor1 (1.0 by default) for the set of time periods specified by media_selected_times . Similarly, Treatment_0 means that media execution is multiplied by scaling_factor0 (0.0 by default) for these time periods.

For paid & organic channels with reach and frequency data, either reach or frequency is held fixed while the other is scaled, depending on the by_reach argument.

For non-media treatments, Treatment_1 means that the variable is set to historical values. Treatment_0 means that the variable is set to its baseline value for all geos and time periods. Note that the scaling factors ( scaling_factor0 and scaling_factor1 ) are not applicable to non-media treatments.

"Outcome" refers to either revenue if use_kpi=False , or kpi if use_kpi=True . When revenue_per_kpi is not defined, use_kpi cannot be False.

If new_data=None , this method computes incremental outcome using media , reach , frequency , organic_media , organic_reach , organic_frequency , non_media_treatments and revenue_per_kpi tensors that the Meridian object was initialized with. This behavior can be overridden with the new_data argument. For example, new_data=DataTensors(media=new_media) calculates incremental outcome using the new_media tensor and the original values of reach , frequency , organic_media , organic_reach , organic_frequency , non_media_treatments and revenue_per_kpi tensors.

The calculation in this method depends on two key assumptions made in the Meridian implementation:

  1. Additivity of media effects (no interactions).
  2. Additive changes on the model KPI scale correspond to additive changes on the original KPI scale. In other words, the intercept and control effects do not influence the media effects. This assumption currently holds because the outcome transformation only involves centering and scaling, for example, no log transformations.

Args

use_posterior
Boolean. If True , then the incremental outcome posterior distribution is calculated. Otherwise, the prior distribution is calculated.
new_data
Optional DataTensors container with optional tensors: media , reach , frequency , organic_media , organic_reach , organic_frequency , non_media_treatments and revenue_per_kpi . If None , the incremental outcome is calculated using the InputData provided to the Meridian object. If new_data is provided, the incremental outcome is calculated using the new tensors in new_data and the original values of the remaining tensors. For example, incremental_outcome(new_data=DataTensors(media=new_media) computes the incremental outcome using new_media and the original values of reach , frequency , organic_media , organic_reach , organic_frequency , non_media_treatments and revenue_per_kpi . If any of the tensors in new_data is provided with a different number of time periods than in InputData , then all tensors must be provided with the same number of time periods.
non_media_baseline_values
Optional list of shape (n_non_media_channels,) . Each element is a float which means that the fixed value will be used as baseline for the given channel. It is expected that they are scaled by population for the channels where model_spec.non_media_population_scaling_id is True . If None , the model_spec.non_media_baseline_values is used, which defaults to the minimum value for each non_media treatment channel.
scaling_factor0
Float. The factor by which to scale the counterfactual scenario "Media_0" during the time periods specified in media_selected_times . Must be non-negative and less than scaling_factor1 .
scaling_factor1
Float. The factor by which to scale "Media_1" during the selected time periods specified in media_selected_times . Must be non-negative and greater than scaling_factor0 .
selected_geos
Optional list containing a subset of geos to include. By default, all geos are included.
selected_times
Optional list containing either a subset of dates to include or booleans with length equal to the number of time periods in new_data if time is modified in new_data , or input_data.n_times otherwise. The incremental outcome corresponds to incremental KPI generated during the selected_times arg by media executed during the media_selected_times arg. Note that if use_kpi=False , then selected_times can only include the time periods that have revenue_per_kpi input data. By default, all time periods are included where revenue_per_kpi data is available.
media_selected_times
Optional list containing either a subset of dates to include or booleans with length equal to the number of time periods in KPI data or number of time periods in the new_data args, if provided. If new_data is provided, media_selected_times can select any subset of time periods in new_data . If new_data is not provided, media_selected_times selects from InputData.time . The incremental outcome corresponds to incremental KPI generated during the selected_times arg by treatment variables executed during the media_selected_times arg. For each channel, the incremental outcome is defined as the difference between expected KPI when treatment variables execution is scaled by scaling_factor1 and scaling_factor0 during these specified time periods. By default, the difference is between treatment variables at historical execution levels, or as provided in new_data , versus zero execution. Defaults to include all time periods.
aggregate_geos
Boolean. If True , then incremental outcome is summed over all regions.
aggregate_times
Boolean. If True , then incremental outcome is summed over all time periods.
inverse_transform_outcome
Boolean. If True , returns the expected outcome in the original KPI or revenue (depending on what is passed to use_kpi ), as it was passed to InputData . If False, returns the outcome after transformation by KpiTransformer , reflecting how its represented within the model.
use_kpi
Boolean. If use_kpi = True , the expected KPI is calculated; otherwise the expected revenue (kpi * revenue_per_kpi) is calculated. It is required that use_kpi = True if revenue_per_kpi data is not available or if inverse_transform_outcome = False .
by_reach
Boolean. If True , then the incremental outcome is calculated by scaling the reach and holding the frequency constant. If False , then the incremental outcome is calculated by scaling the frequency and holding the reach constant. Only used for channels with RF data.
include_non_paid_channels
Boolean. If True , then non-media treatments and organic effects are included in the calculation. If False , then only the paid media and RF effects are included.
batch_size
Integer representing the maximum draws per chain in each batch. The calculation is run in batches to avoid memory exhaustion. If a memory error occurs, try reducing batch_size . The calculation will generally be faster with larger batch_size values.

Returns
Tensor of incremental outcome (either KPI or revenue, depending on use_kpi argument) with dimensions (n_chains, n_draws, n_geos, n_times, n_channels) . If include_non_paid_channels=True , then n_channel is the total number of media, RF, organic media, and organic RF and non-media channels. If include_non_paid_channels=False , then n_channels is the total number of media and RF channels. The n_geos and n_times dimensions are dropped if aggregate_geos=True or aggregate_times=True , respectively.

Raises

NotFittedModelError
If sample_posterior() (for use_posterior=True ) or sample_prior() (for use_posterior=False ) has not been called prior to calling this method.
ValueError
If new_data argument contains tensors with modified time dimension and not all treatment variables are provided in new_data with matching time dimensions.

marginal_roi

View source

Calculates the marginal ROI prior or posterior distribution.

The marginal ROI (mROI) numerator is the change in expected outcome ( kpi or kpi * revenue_per_kpi ) when one channel's spend is increased by a small fraction. The mROI denominator is the corresponding small fraction of the channel's total spend.

If new_data=None , this method calculates marginal ROI conditional on the values of the paid media variables that the Meridian object was initialized with. The user can also override this historical data through the new_data argument. For example,

  new_data 
 = 
 DataTensors 
 ( 
 media 
 = 
 new_media 
 , 
 frequency 
 = 
 new_frequency 
 ) 
 

If selected_geos or selected_times is specified, then the mROI denominator is based on the total spend during the selected geos and time periods. An exception will be thrown if the spend of the InputData used to train the model does not have geo and time dimensions. (If the new_data.media_spend and new_data.rf_spend arguments are used with different dimensions than the InputData spend, then an exception will be thrown since this is a likely user error.)

Args

incremental_increase
Small fraction by which each channel's spend is increased when calculating its mROI numerator. The mROI denominator is this fraction of the channel's total spend. Only used if marginal is True .
use_posterior
If True then the posterior distribution is calculated. Otherwise, the prior distribution is calculated.
new_data
Optional. DataTensors containing media , media_spend , reach , frequency , rf_spend and revenue_per_kpi data. If provided, the marginal ROI is calculated using the values of the tensors passed in new_data and the original values of all the remaining tensors. If None , the marginal ROI is calculated using the original values of all the tensors. If any of the tensors in new_data is provided with a different number of time periods than in InputData , then all tensors must be provided with the same number of time periods.
selected_geos
Optional. Contains a subset of geos to include. By default, all geos are included.
selected_times
Optional list containing either a subset of dates to include or booleans with length equal to the number of time periods in the new_data args, if provided. By default, all time periods are included.
aggregate_geos
If True , the expected revenue is summed over all of the regions.
by_reach
Used for a channel with reach and frequency. If True , returns the mROI by reach for a given fixed frequency. If False , returns the mROI by frequency for a given fixed reach.
use_kpi
If False , then revenue is used to calculate the mROI numerator. Otherwise, uses KPI to calculate the mROI numerator.
batch_size
Maximum draws per chain in each batch. The calculation is run in batches to avoid memory exhaustion. If a memory error occurs, try reducing batch_size . The calculation will generally be faster with larger batch_size values.

Returns
Tensor of mROI values with dimensions (n_chains, n_draws, n_geos, (n_media_channels + n_rf_channels)) . The n_geos dimension is dropped if aggregate_geos=True .

negative_baseline_probability

View source

Calculates either prior or posterior negative baseline probability.

This calculates either the prior or posterior probability that the baseline, aggregated over the supplied time window, is negative.

The baseline is calculated by computing expected_outcome with the following assumptions: 1) media is set to all zeros, 2) reach is set to all zeros, 3) organic_media is set to all zeros, 4) organic_reach is set to all zeros, 5) non_media_treatments is set to the counterfactual values according to the non_media_baseline_values argument, 6) controls are set to historical values.

Args

non_media_baseline_values
Optional list of shape (n_non_media_channels,) . Each element is a float denoting a fixed value that will be used as the baseline for the given channel. It is expected that they are scaled by population for the channels where model_spec.non_media_population_scaling_id is True . If None , the model_spec.non_media_baseline_values is used, which defaults to the minimum value for each non_media treatment channel.
use_posterior
Boolean. If True , then the expected outcome posterior distribution is calculated. Otherwise, the prior distribution is calculated.
selected_geos
Optional list of containing a subset of geos to include. By default, all geos are included.
selected_times
Optional list of containing a subset of dates to include. The values accepted here must match time dimension coordinates from InputData.time . By default, all time periods are included.
use_kpi
Boolean. If use_kpi = True , the expected KPI is calculated; otherwise the expected revenue (kpi * revenue_per_kpi) is calculated. It is required that use_kpi = True if revenue_per_kpi is not defined or if inverse_transform_outcome = False .
batch_size
Integer representing the maximum draws per chain in each batch. The calculation is run in batches to avoid memory exhaustion. If a memory error occurs, try reducing batch_size . The calculation will generally be faster with larger batch_size values.

Returns
A float representing the prior or posterior negative baseline probability over the supplied time window.

Raises

NotFittedModelError
if sample_posterior() (for use_posterior=True ) or sample_prior() (for use_posterior=False ) has not been called prior to calling this method.

optimal_freq

View source

Calculates the optimal frequency that maximizes posterior mean ROI.

For this optimization, historical spend is used and fixed, and frequency is restricted to be constant across all geographic regions and time periods. Reach is calculated for each geographic area and time period such that the number of impressions remains unchanged as frequency varies. Meridian solves for the frequency at which posterior mean ROI is optimized.

If new_data=None , this method calculates the opptimal frequency on the values of the paid RF variables that the Meridian object was initialized with. The user can override this historical data through the new_data argument. For example,

  new_data 
 = 
 DataTensors 
 ( 
 reach 
 = 
 new_reach 
 , 
 frequency 
 = 
 new_frequency 
 ) 
 

Args

new_data
Optional DataTensors object containing rf_impressions , rf_spend , and revenue_per_kpi . If provided, the optimal frequency is calculated using the values of the tensors passed in new_data and the original values of all the remaining tensors. If None , the historical data used to initialize the Meridian object is used. If any of the tensors in new_data is provided with a different number of time periods than in InputData , then all tensors must be provided with the same number of time periods.
max_frequency
Maximum frequency value used to calculate the frequency grid. If None , the maximum frequency value is calculated from the historic frequency (maximum value of Meridian.input_data, not new_data ). If freq_grid is provided, this argument has no effect.
freq_grid
List of frequency values. The ROI of each channel is calculated for each frequency value in the list. By default, the list includes numbers from 1.0 to the maximum frequency in increments of 0.1 .
use_posterior
Boolean. If True , posterior optimal frequencies are generated. If False , prior optimal frequencies are generated.
use_kpi
Boolean. If True , the counterfactual metrics are calculated using KPI. If False , the counterfactual metrics are calculated using revenue.
selected_geos
Optional list containing a subset of geos to include. By default, all geos are included.
selected_times
Optional list containing either a subset of dates to include or booleans with length equal to the number of time periods in new_data if time is modified in new_data , or input_data.n_times otherwise. By default, all time periods are included.
confidence_level
Confidence level for prior and posterior credible intervals, represented as a value between zero and one.

Returns
An xarray Dataset which contains:
  • Coordinates: frequency , rf_channel , metric ( mean , median , ci_lo , ci_hi ).
  • Data variables:
    • optimal_frequency : The frequency that optimizes the posterior mean of ROI.
    • roi : The ROI for each frequency value in freq_grid .
    • optimized_incremental_outcome : The incremental outcome based on the optimal frequency.
    • optimized_effectiveness : The effectiveness based on the optimal frequency.
    • optimized_roi : The ROI based on the optimal frequency.
    • optimized_mroi_by_reach : The marginal ROI with a small change in reach and fixed frequency at the optimal frequency.
    • optimized_mroi_by_frequency : The marginal ROI with a small change around the optimal frequency and fixed reach.
    • optimized_cpik : The CPIK based on the optimal frequency.

Raises

NotFittedModelError
If sample_posterior() (for use_posterior=True ) or sample_prior() (for use_posterior=False ) has not been called prior to calling this method.
ValueError
If there are no channels with reach and frequency data.

predictive_accuracy

View source

Calculates R-Squared , MAPE , and wMAPE goodness of fit metrics.

R-Squared , MAPE (mean absolute percentage error), and wMAPE (weighted absolute percentage error) are calculated on the revenue scale ( KPI * revenue_per_kpi ) when revenue_per_kpi is specified, or the KPI scale when revenue_per_kpi = None . This is the same scale as what is used in the ROI numerator (incremental outcome).

Prediction errors in wMAPE are weighted by the actual revenue ( KPI * revenue_per_kpi ) when revenue_per_kpi is specified, or weighted by the KPI scale when revenue_per_kpi = None . This means that percentage errors when revenue is high are weighted more heavily than errors when revenue is low.

R-Squared , MAPE and wMAPE are calculated both at the model-level (one observation per geo and time period) and at the national-level (aggregating KPI or revenue outcome across geos so there is one observation per time period).

R-Squared , MAPE , and wMAPE are calculated for the full sample. If the model object has any holdout observations, then R-squared , MAPE , and wMAPE are also calculated for the Train and Test subsets.

Args

selected_geos
Optional list containing a subset of geos to include. By default, all geos are included.
selected_times
Optional list containing a subset of dates to include. By default, all time periods are included.
batch_size
Integer representing the maximum draws per chain in each batch. By default, batch_size is 100 . The calculation is run in batches to avoid memory exhaustion. If a memory error occurs, try reducing batch_size . The calculation will generally be faster with larger batch_size values.

Returns
An xarray Dataset containing the computed R_Squared , MAPE , and wMAPE values, with coordinates metric , geo_granularity , evaluation_set , and accompanying data variable value . If holdout_id exists, the data is split into 'Train' , 'Test' , and 'All Data' subsections, and the three metrics are computed for each.

response_curves

View source

Method to generate a response curves xarray.Dataset.

Response curves are calculated in aggregate across geos and time periods, assuming the historical flighting pattern across geos and time periods for each media channel.

A list of multipliers is applied to each media channel's total historical spend within selected_geos and selected_times to obtain the x-axis values. The y-axis values are the incremental outcome generated by each channel within selected_geos and selected_times under the counterfactual where media units in each geo and time period are scaled by the corresponding multiplier. (Media units for time periods prior to selected_times are also scaled by the multiplier.)

Args

spend_multipliers
List of multipliers. Each channel's total spend is multiplied by these factors to obtain the values at which the curve is calculated for that channel.
use_posterior
Boolean. If True , posterior response curves are generated. If False , prior response curves are generated.
selected_geos
Optional list containing a subset of geos to include. By default, all geos are included.
selected_times
Optional list containing a subset of dates to include. By default, all time periods are included.
by_reach
Boolean. For channels with reach and frequency. If True , plots the response curve by reach. If False , plots the response curve by frequency.
use_optimal_frequency
If True , uses the optimal frequency to plot the response curves. Defaults to False .
use_kpi
A boolean flag indicating whether to use KPI instead of revenue to generate the response curves. Defaults to False .
confidence_level
Confidence level for prior and posterior credible intervals, represented as a value between zero and one.
batch_size
Integer representing the maximum draws per chain in each batch. The calculation is run in batches to avoid memory exhaustion. If a memory error occurs, try reducing batch_size . The calculation will generally be faster with larger batch_size values.

Returns
An xarray.Dataset containing the data needed to visualize response curves.

rhat_summary

View source

Computes a summary of the R-hat values for each parameter in the model.

Summarizes the Gelman & Rubin (1992) potential scale reduction for chain convergence, commonly referred to as R-hat. It is a convergence diagnostic measure that measures the degree to which variance (of the means) between chains exceeds what you would expect if the chains were identically distributed. Values close to 1.0 indicate convergence. R-hat < 1.2 indicates approximate convergence and is a reasonable threshold for many problems (Brooks & Gelman, 1998).

References
Andrew Gelman and Donald B. Rubin. Inference from Iterative Simulation Using Multiple Sequences. Statistical Science, 7(4):457-472, 1992. Stephen P. Brooks and Andrew Gelman. General Methods for Monitoring Convergence of Iterative Simulations. Journal of Computational and Graphical Statistics, 7(4), 1998.

Args

bad_rhat_threshold
The threshold for determining which R-hat values are considered bad.

Returns
A DataFrame with the following columns:
  • n_params : The number of respective parameters in the model.
  • avg_rhat : The average R-hat value for the respective parameter.
  • n_params : The number of respective parameters in the model.
  • avg_rhat : The average R-hat value for the respective parameter.
  • max_rhat : The maximum R-hat value for the respective parameter.
  • percent_bad_rhat : The percentage of R-hat values for the respective parameter that are greater than bad_rhat_threshold .
  • row_idx_bad_rhat : The row indices of the R-hat values that are greater than bad_rhat_threshold .
  • col_idx_bad_rhat : The column indices of the R-hat values that are greater than bad_rhat_threshold .

Raises

NotFittedModelError
If self.sample_posterior() is not called before calling this method.
ValueError
If the number of dimensions of the R-hat array for a parameter is not 1 or 2 .

roi

View source

Calculates ROI prior or posterior distribution for each media channel.

The ROI numerator is the change in expected outcome ( kpi or kpi * revenue_per_kpi ) when one channel's spend is set to zero, leaving all other channels' spend unchanged. The ROI denominator is the total spend of the channel.

If new_data=None , this method calculates ROI conditional on the values of the paid media variables that the Meridian object was initialized with. The user can also override this historical data through the new_data argument. For example,

  new_data 
 = 
 DataTensors 
 ( 
 media 
 = 
 new_media 
 , 
 frequency 
 = 
 new_frequency 
 ) 
 

If selected_geos or selected_times is specified, then the ROI denominator is the total spend during the selected geos and time periods. An exception will be thrown if the spend of the InputData used to train the model does not have geo and time dimensions. (If the new_data.media_spend and new_data.rf_spend arguments are used with different dimensions than the InputData spend, then an exception will be thrown since this is a likely user error.)

Args

use_posterior
Boolean. If True , then the posterior distribution is calculated. Otherwise, the prior distribution is calculated.
new_data
Optional. DataTensors containing media , media_spend , reach , frequency , and rf_spend , and revenue_per_kpi data. If provided, the ROI is calculated using the values of the tensors passed in new_data and the original values of all the remaining tensors. If None , the ROI is calculated using the original values of all the tensors. If any of the tensors in new_data is provided with a different number of time periods than in InputData , then all tensors must be provided with the same number of time periods.
selected_geos
Optional. Contains a subset of geos to include. By default, all geos are included.
selected_times
Optional list containing either a subset of dates to include or booleans with length equal to the number of time periods in the new_data args, if provided. By default, all time periods are included.
aggregate_geos
Boolean. If True , the expected revenue is summed over all of the regions.
use_kpi
If False , then revenue is used to calculate the ROI numerator. Otherwise, uses KPI to calculate the ROI numerator.
batch_size
Integer representing the maximum draws per chain in each batch. The calculation is run in batches to avoid memory exhaustion. If a memory error occurs, try reducing batch_size . The calculation will generally be faster with larger batch_size values.

Returns
Tensor of ROI values with dimensions (n_chains, n_draws, n_geos, (n_media_channels + n_rf_channels)) . The n_geos dimension is dropped if aggregate_geos=True .

summary_metrics

View source

Returns summary metrics.

If new_data=None , this method calculates all the metrics conditional on the values of the data variables that the Meridian object was initialized with. The user can also override this historical data through the new_data argument. For example, to override the media, frequency, and non-media treatments data variables, the user can pass the following new_data argument:

  new_data 
 = 
 DataTensors 
 ( 
 media 
 = 
 new_media 
 , 
 frequency 
 = 
 new_frequency 
 , 
 non_media_treatments 
 = 
 new_non_media_treatments 
 ) 
 

Note that if new_data is provided with a different number of time periods than in InputData , pct_of_contribution is not defined because expected_outcome() is not defined for new time periods.

Note that mroi and effectiveness metrics are not defined ( math.nan ) for the aggregate "All Paid Channels" channel dimension.

Args

new_data
Optional DataTensors object with optional new tensors: media , media_spend , reach , frequency , rf_spend , organic_media , organic_reach , organic_frequency , non_media_treatments , controls , revenue_per_kpi . If provided, the summary metrics are calculated using the values of the tensors passed in new_data and the original values of all the remaining tensors. If None , the summary metrics are calculated using the original values of all the tensors. If new_data is provided with a different number of time periods than in InputData , then all tensors, except controls , must have the same number of time periods.
marginal_roi_by_reach
Boolean. Marginal ROI (mROI) is defined as the return on the next dollar spent. If this argument is True , the assumption is that the next dollar spent only impacts reach, holding frequency constant. If this argument is False , the assumption is that the next dollar spent only impacts frequency, holding reach constant. Used only when include_non_paid_channels is False .
marginal_roi_incremental_increase
Small fraction by which each channel's spend is increased when calculating its mROI numerator. The mROI denominator is this fraction of the channel's total spend. Used only when include_non_paid_channels is False .
selected_geos
Optional list containing a subset of geos to include. By default, all geos are included.
selected_times
Optional list containing either a subset of dates to include or booleans with length equal to the number of time periods in the tensors in the new_data argument, if provided. By default, all time periods are included.
aggregate_geos
Boolean. If True , the expected outcome is summed over all of the regions.
aggregate_times
Boolean. If True , the expected outcome is summed over all of the time periods. Note that if False , ROI, mROI, Effectiveness, and CPIK are not reported because they do not have a clear interpretation by time period.
optimal_frequency
An optional list with dimension n_rf_channels , containing the optimal frequency per channel, that maximizes posterior mean ROI. Default value is None , and historical frequency is used for the metrics calculation.
use_kpi
Boolean. If True , the summary metrics are calculated using KPI. If False , the metrics are calculated using revenue.
confidence_level
Confidence level for summary metrics credible intervals, represented as a value between zero and one.
batch_size
Integer representing the maximum draws per chain in each batch. The calculation is run in batches to avoid memory exhaustion. If a memory error occurs, try reducing batch_size . The calculation will generally be faster with larger batch_size values.
include_non_paid_channels
Boolean. If True , non-paid channels (organic media, organic reach and frequency, and non-media treatments) are included in the summary but only the metrics independent of spend are reported. If False , only the paid channels (media, reach and frequency) are included but the summary contains also the metrics dependent on spend. Default: False .
non_media_baseline_values
Optional list of shape (n_non_media_channels,) . Each element is a float which means that the fixed value will be used as baseline for the given channel. It is expected that they are scaled by population for the channels where model_spec.non_media_population_scaling_id is True . If None , the model_spec.non_media_baseline_values is used, which defaults to the minimum value for each non_media treatment channel.

Returns
An xr.Dataset with coordinates: channel , metric ( mean , median , ci_low , ci_high ), distribution (prior, posterior) and contains the following non-paid data variables: incremental_outcome , pct_of_contribution , effectiveness , and the following paid data variables: impressions , pct_of_impressions , spend , pct_of_spend , CPM , roi , mroi , cpik . The paid data variables are only included when include_non_paid_channels is False . Note that roi , mroi , cpik , and effectiveness metrics are not reported when aggregate_times=False because they do not have a clear interpretation by time period.

Create a Mobile Website
View Site in Mobile | Classic
Share by: