Bayesian Analysis¶

Bayesian A/B testing answers a different question than frequentist testing. Instead of "Is the result statistically significant?", it asks "What is the probability that treatment is better than control?"

BayesianExperiment¶

from splita import BayesianExperiment
import numpy as np

rng = np.random.default_rng(42)
ctrl = rng.binomial(1, 0.10, 5000)
trt = rng.binomial(1, 0.12, 5000)

result = BayesianExperiment(ctrl, trt).run()
print(result.prob_treatment_better)  # ~0.99
print(result.expected_loss)          # expected loss if you ship treatment
print(result.credible_interval)      # 95% credible interval
print(result.lift)                   # posterior mean of the difference

Understanding the results¶

P(B > A)¶

The probability that the treatment mean/rate is higher than control. This is what most business stakeholders actually want to know.

> 0.95: Strong evidence for treatment
0.90 - 0.95: Moderate evidence
< 0.90: Insufficient evidence

Expected loss¶

The expected cost of choosing the wrong variant. If you ship treatment and it turns out to be worse, how much do you lose on average?

# Decision rule: ship if expected loss < threshold
if result.expected_loss < 0.001:  # less than 0.1% loss
    print("Safe to ship")

ROPE (Region of Practical Equivalence)¶

Define a range of effects you consider practically equivalent to zero:

result = BayesianExperiment(
    ctrl, trt,
    rope=(-0.005, 0.005),  # +/- 0.5pp is "practically zero"
).run()
print(result.prob_in_rope)  # probability the effect is negligible

Conversion metrics¶

For binary (0/1) data, BayesianExperiment uses a Beta-Binomial model:

ctrl = rng.binomial(1, 0.10, 5000)
trt = rng.binomial(1, 0.115, 5000)

result = BayesianExperiment(ctrl, trt, metric='conversion').run()

The default prior is Beta(1, 1) -- a uniform prior. You can specify informative priors:

result = BayesianExperiment(
    ctrl, trt,
    metric='conversion',
    prior={"alpha": 10, "beta": 90},  # prior centered at 10%
).run()

Continuous metrics¶

For continuous data, BayesianExperiment uses a Normal-Inverse-Gamma model:

ctrl = rng.normal(25, 8, 1000)
trt = rng.normal(26.5, 8, 1000)

result = BayesianExperiment(ctrl, trt, metric='continuous').run()
print(result.prob_treatment_better)
print(result.credible_interval)

The default prior is vague (non-informative). Specify informative priors with:

result = BayesianExperiment(
    ctrl, trt,
    metric='continuous',
    prior={"mu": 25, "kappa": 1, "alpha": 3, "beta": 100},
).run()

ObjectiveBayesianExperiment¶

Empirical Bayes: learn the prior from historical experiment data rather than specifying it manually.

from splita import ObjectiveBayesianExperiment
import numpy as np

rng = np.random.default_rng(42)
ctrl = rng.binomial(1, 0.10, 5000)
trt = rng.binomial(1, 0.12, 5000)

# Historical lift data from past experiments
historical_lifts = rng.normal(0.01, 0.02, 50)

result = ObjectiveBayesianExperiment(
    ctrl, trt,
    historical_effects=historical_lifts,
).run()
print(result.prob_treatment_better)
print(result.empirical_prior)

Frequentist vs Bayesian: when to use which¶

Scenario	Recommendation
Regulatory/scientific context	Frequentist (`Experiment`)
Business decision-making	Bayesian (`BayesianExperiment`)
Need to communicate to non-statisticians	Bayesian (P(B>A) is intuitive)
Want to use informative priors	Bayesian
Need always-valid monitoring	Sequential (`mSPRT`)
Standard hypothesis test	Frequentist

Bayesian decision framework¶

from splita import BayesianExperiment
import numpy as np

rng = np.random.default_rng(42)
ctrl = rng.binomial(1, 0.10, 5000)
trt = rng.binomial(1, 0.115, 5000)

result = BayesianExperiment(ctrl, trt, rope=(-0.005, 0.005)).run()

# Decision rules
if result.prob_treatment_better > 0.95 and result.expected_loss < 0.001:
    decision = "Ship treatment"
elif result.prob_in_rope > 0.90:
    decision = "No practical difference -- keep control"
else:
    decision = "Inconclusive -- collect more data"

print(decision)