Statistical methodology for Bayesian experiments

This section includes an explanation of advanced statistical concepts. We provide them for informational purposes, but you do not need to understand these concepts to use Experimentation.

Overview

This guide explains the statistical methodology LaunchDarkly uses to calculate Bayesian experiment variation means, and how these analytics formulas are useful for validating your results. For a high-level overview of Bayesian and frequentist statistics, read Bayesian versus frequentist statistics.

Core formulas

The core formulas include the posterior mean, the data mean, and the data weight. We describe these in detail below.

Posterior mean

In the Bayesian approach, the main quantity we report is the mean of the posterior distribution calculated by updating the prior distribution with data observed in your experiment. At a high level, the posterior means for all experiment variations and for any metric type, including conversion metrics and numeric metrics, can be represented by a convenient formula:$$ \begin PosteriorMean = Weight \cdot DataMean + \left(1 - Weight \right) \cdot PriorMean \end

- **Data mean**: The mean estimated from the data - **Prior mean**: The mean of the Bayesian prior distribution assumed for the experiment variation mean - **Weight**: A number between 0 and 1 which broadly reflects the amount of precision in our data mean. Broadly, the posterior mean is a weighted sum of the mean of the prior distribution and the mean calculated from the data. As more data arrives to the experiment, the weight increases and the posterior mean is influenced relatively more by the observed data and relatively less by the prior distribution. The specific behavior of this differs slightly between the control variation and the treatment variations, but this general principle holds for both. When you hover over the “Conversion rate” or “Posterior mean” heading in an experiment’s results table, you can view the conversion rate or posterior mean. When you hover over an actual conversion rate or posterior mean value, you can view actual numbers in the formulas instead of descriptions. ### Data mean The formula for the data mean differs between conversion metrics and numeric metrics: - **Conversion metrics**, including custom conversion binary, custom conversion count, page viewed, and clicked or tapped metrics, use the total number of conversions divided by the total number of exposures: $DataMean = SampleMean = Conversions / Exposures$ - **Numeric metrics** use the total value divided by the total number of exposures: $DataMean = SampleMean = TotalValue / Exposures$CUPED may affect the exact computation of these results. To learn more, read [Covariate adjustment and CUPED methodology](/guides/statistical-methodology/cuped). ### Data weight The precision weight is given by:$$ \begin{aligned} Weight = \frac{DataMeanPrecision} {DataMeanPrecision + PriorPrecision} \end{aligned} $$​ This represents the proportion of the total precision due to the data mean. However, the precision is defined differently depending on the statistical model used. There are two statistical models for estimating the posterior mean of experiment metrics: - **Normal-normal model**: This model has a normal prior and a normal likelihood, and is used for numeric metrics. - **Beta-binomial model**: This model has a beta prior distribution and a binomial likelihood, and is used for binary metrics when [CUPED](/guides/statistical-methodology/cuped) is not applied. For the normal-normal model, precision is defined as the inverse of the variance, so that the precision weight is:$$ \begin{aligned} Weight = \frac{1 / DataMeanVariance} {1 / DataMeanVariance + 1 / PriorVariance} \end{aligned} $$For the beta-binomial model, precision is defined as the number of units for the data sample and the number of pseudo-units for the beta prior distribution. You can consider the $\alpha_{prior}$ and $\beta_{prior}$ parameters of the beta prior distribution as, respectively, the number of converted pseudo-units and the number of non-converted pseudo-units, so that the number of pseudo-units for the prior distribution is $\alpha_{prior} + \beta_{prior}$. If we denote by $n$ the number of units in the data sample, then the precision weight is given by:$$ \begin{aligned} Weight = \frac{n}{n + \alpha_{prior} + \beta_{prior}} \end{aligned}

Details of our Bayesian approach

The Bayesian approach to analysis involves two steps:

Incorporating a subjective prior belief about parameters of interest, usually means, plus objective data collected during the experiment to create posterior distributions for each variation representing our current knowledge about what values those parameters are likely to take.
Using that posterior distribution to compute helpful statistical measures that aid in making a decision about what action to take. For example, ship the treatment, don’t ship the treatment, and so on.

The most complicated part of the setup involves creating the posterior distribution because it involves fine parameter tuning and different treatments for different types of metrics. After we compute these distributions, we indicate them to you on the results page using these summaries:

Credible intervals that convey the spread of the posterior distribution, which represents the range of likely values for the true mean of the variation
Posterior means that convey the center of the posterior distribution, which represents our current best estimate of the true meanAfter the posterior distribution is created, then it is a relatively simple procedure to compute the statistics we display on the results page to help you make a decision. To learn more about these results, read Results table data. Below we dive into detail on how we accomplish these two steps.

Calculating posterior distributions

At LaunchDarkly, we use different statistical models for binary data and numeric data. In both cases, we use conjugate distributions, meaning that the family of the prior distribution is the same as the family of the posterior distribution:

For binary metrics, we start with a Beta distribution for the prior and update that into another Beta distribution for the posterior
For numeric metrics, we start with a Normal distribution for the prior and update that into another Normal distribution for the posteriorWe give some technical details on the exact specification of the priors below, as well as some closed-form expressions for the posterior distributions once data is incorporated.

Binary data

Binary metrics are also called “occurrence” metrics in LaunchDarkly. That is, binary metrics result in either a 0 or a 1 recorded for each context in the experiment. For more information, read Custom conversion binary metrics. The natural approach for binary data is to use a Binomial likelihood function with a Beta prior, which results in another Beta distribution for the posterior. Suppose that

\bar{y}_v

is the proportion of the

N_v

units in variation

v

that are converted. Then a total of

N_{v} \bar{y}_v

units converted, and

N_{v} (1 - \bar{y}_v)

units did not convert.

Numeric data

Although numeric data can take a variety of forms and be modeled by many different kinds of probability distributions, we can use a simplified approach that leverages the central limit theorem. Because the quantity of interest is usually some unknown population mean which is estimated by the sample mean, we can have reasonably high confidence that the normal distribution will be a good fit for the likelihood of the sample mean as we collect more and more data:

\begin{aligned} f_{\mathrm{like}}(\bar{y}_v \mid \mu_v) = \mathsf{Normal}(\mu_v, \sigma^2 / N_v) \end{aligned}

To further simplify the model, we treat the variance parameter as known and simply use the natural plug-in estimate, the sample variance computed from the data. As sample sizes increase, this plug-in estimate is guaranteed to converge to the true variance. To complete the model, we need to specify a prior distribution for

\mu_v

. For the control variation, we use an improper non-informative prior

f_{\mathrm{prior}}(\mu_0) \propto 1

. For the other variations, we use priors that shrink the results towards the control variation’s mean. We generate this prior from the empirical distribution of relative differences between variations in all experiments on our platform using metrics of the same type (numeric or conversion) and aggregation function (average or sum). The equation for this prior is:

\begin{aligned} f_{\mathrm{prior}}(\mu_v) &= \mathsf{Normal}(a_v, w_v^2), \\ a_v &= \bar{y}_0, \\ w_v^2 &= \bar{y}_0^2 \hat{\gamma}^2 + \hat{\sigma}_0^2 / N_0 \end{aligned} $$where $\hat{\gamma}^2$ is the variance of the distribution of observed relative differences ($(\bar{y}_v - \bar{y}_ 0) / \bar{y}_0$) across all experiments with numeric metrics on the platform. The first term, $\bar{y}_0^2 \hat{\gamma}^2$, scales the expected relative difference by the observed control mean. The second term, $\hat{\sigma}_0^2 / N_0$, accounts for the uncertainty in our estimate of the control mean. The value of $\hat{\gamma}^2$ is between 0.13 and 0.19, conditional on the type of the metric. Combining the likelihood and prior provides the posterior distribution of $\mu_v$, which represents our beliefs about $\mu_v$ *after* observing the data from the experiment. Given the normal likelihood and prior, the posterior distribution is also a normal distribution with the following parameters:$$ \begin{aligned} f_{\mathrm{post}}(\mu_v) &= \mathsf{Normal}(\alpha_v, \omega_v^ 2) , \\ \alpha_v &= \omega_v^2 \left(\frac{N_v}{\hat{\sigma}_v^2} \bar{y}_v + \frac{1}{w_v^2} a_v \right) , \\ \omega_v^2 &= \left(\frac{1}{w_v^2} + \frac{N_v}{\hat{\sigma}^2_v} \right)^{-1} \end{aligned} $$The experiment results page displays the posterior distributions of each variation's mean ($f_{\mathrm{post}}(\mu_v)$) in the [probability charts](/home/experimentation/analyze). We use the expected value of the posterior distribution as a point estimate for $\mu_v$,$$ \hat{\mu}_v = \mathbb{E}[f_{\mathrm{post}}(\mu_v)] = \alpha_v

Conversion metrics

Conversion metrics use binary data. We use a Binomial likelihood function with a Beta prior, which results in another Beta distribution for the posterior. Suppose that

\bar{y}_v

is the proportion of the

N_v

units in variation

v

that are converted. Then a total of

N_{v} \bar{y}_v

units converted, and

N_{v} (1 - \bar{y}_v)

units did not convert. To model the total number of conversions (

N_{v} \bar{y}_v

), we use a binomial distribution with proportion parameter

\mu_v

and size

N_v

as the likelihood function:

\begin{aligned} f_{\mathrm{like}}(N_v \bar{y}_v) &= \mathsf{Binomial}(N_v, \mu_v) \end{aligned}

We use a Beta distribution as the prior for

\mu_v

\begin{aligned} f_{\mathrm{prior}}(\mu_v) &= \mathsf{Beta}(a_v, b_v) \end{aligned}

The values of the prior hyperparameters

a_v

and

b_v

differ between the control (

v = 0

) and treatment variations (

v \neq 0

). For the control variation (

v = 0

), we use a distribution with

a_0 = 1

and

b_0 = 1

. For the treatment variations (

v \neq 0

), we use a prior similar to the one used for numeric metrics. The prior for treatment variations is a Beta distribution with hyperparameters

a_v

b_v

parameters such that its expected value and variance are:

\begin{aligned} \mathbb{E}[f_{\mathrm{prior}}(\mu_v)] &= \bar{y}_0, \\ \mathrm{Var}(f_{\mathrm{prior}}(\mu_v)) &= \bar{y}_0^2 \hat{\gamma}^2 + \frac{\bar{y}_0 (1 - \bar{y}_0)}{N_0} \end{aligned}

The value of

\gamma^2

is the variance of the empirical distribution of relative differences of experiments using a binary metric, and is currently set to

\gamma^2 \approx 0.04

. The posterior distribution of

\mu_v

is also a Beta distribution:

undefined

Feature flags and AI Configs

Releases

Infrastructure

Your account

Statistical methodology for Bayesian experiments

Overview

Core formulas

Posterior mean

Details of our Bayesian approach

Calculating posterior distributions

Binary data

Numeric data

Conversion metrics

Feature flags and AI Configs

Releases

Infrastructure

Your account

Documentation Index

​Overview

​Core formulas

​Posterior mean

​Details of our Bayesian approach

​Calculating posterior distributions

​Binary data

​Numeric data

​Conversion metrics

Overview

Core formulas

Posterior mean

Details of our Bayesian approach

Calculating posterior distributions

Binary data

Numeric data

Conversion metrics