Bayesian Analysis¶
Bayesian A/B testing answers a different question than frequentist testing. Instead of "Is the result statistically significant?", it asks "What is the probability that treatment is better than control?"
BayesianExperiment¶
from splita import BayesianExperiment
import numpy as np
rng = np.random.default_rng(42)
ctrl = rng.binomial(1, 0.10, 5000)
trt = rng.binomial(1, 0.12, 5000)
result = BayesianExperiment(ctrl, trt).run()
print(result.prob_treatment_better) # ~0.99
print(result.expected_loss) # expected loss if you ship treatment
print(result.credible_interval) # 95% credible interval
print(result.lift) # posterior mean of the difference
Understanding the results¶
P(B > A)¶
The probability that the treatment mean/rate is higher than control. This is what most business stakeholders actually want to know.
- > 0.95: Strong evidence for treatment
- 0.90 - 0.95: Moderate evidence
- < 0.90: Insufficient evidence
Expected loss¶
The expected cost of choosing the wrong variant. If you ship treatment and it turns out to be worse, how much do you lose on average?
# Decision rule: ship if expected loss < threshold
if result.expected_loss < 0.001: # less than 0.1% loss
print("Safe to ship")
ROPE (Region of Practical Equivalence)¶
Define a range of effects you consider practically equivalent to zero:
result = BayesianExperiment(
ctrl, trt,
rope=(-0.005, 0.005), # +/- 0.5pp is "practically zero"
).run()
print(result.prob_in_rope) # probability the effect is negligible
Conversion metrics¶
For binary (0/1) data, BayesianExperiment uses a Beta-Binomial model:
ctrl = rng.binomial(1, 0.10, 5000)
trt = rng.binomial(1, 0.115, 5000)
result = BayesianExperiment(ctrl, trt, metric='conversion').run()
The default prior is Beta(1, 1) -- a uniform prior. You can specify informative priors:
result = BayesianExperiment(
ctrl, trt,
metric='conversion',
prior={"alpha": 10, "beta": 90}, # prior centered at 10%
).run()
Continuous metrics¶
For continuous data, BayesianExperiment uses a Normal-Inverse-Gamma model:
ctrl = rng.normal(25, 8, 1000)
trt = rng.normal(26.5, 8, 1000)
result = BayesianExperiment(ctrl, trt, metric='continuous').run()
print(result.prob_treatment_better)
print(result.credible_interval)
The default prior is vague (non-informative). Specify informative priors with:
result = BayesianExperiment(
ctrl, trt,
metric='continuous',
prior={"mu": 25, "kappa": 1, "alpha": 3, "beta": 100},
).run()
ObjectiveBayesianExperiment¶
Empirical Bayes: learn the prior from historical experiment data rather than specifying it manually.
from splita import ObjectiveBayesianExperiment
import numpy as np
rng = np.random.default_rng(42)
ctrl = rng.binomial(1, 0.10, 5000)
trt = rng.binomial(1, 0.12, 5000)
# Historical lift data from past experiments
historical_lifts = rng.normal(0.01, 0.02, 50)
result = ObjectiveBayesianExperiment(
ctrl, trt,
historical_effects=historical_lifts,
).run()
print(result.prob_treatment_better)
print(result.empirical_prior)
Frequentist vs Bayesian: when to use which¶
| Scenario | Recommendation |
|---|---|
| Regulatory/scientific context | Frequentist (Experiment) |
| Business decision-making | Bayesian (BayesianExperiment) |
| Need to communicate to non-statisticians | Bayesian (P(B>A) is intuitive) |
| Want to use informative priors | Bayesian |
| Need always-valid monitoring | Sequential (mSPRT) |
| Standard hypothesis test | Frequentist |
Bayesian decision framework¶
from splita import BayesianExperiment
import numpy as np
rng = np.random.default_rng(42)
ctrl = rng.binomial(1, 0.10, 5000)
trt = rng.binomial(1, 0.115, 5000)
result = BayesianExperiment(ctrl, trt, rope=(-0.005, 0.005)).run()
# Decision rules
if result.prob_treatment_better > 0.95 and result.expected_loss < 0.001:
decision = "Ship treatment"
elif result.prob_in_rope > 0.90:
decision = "No practical difference -- keep control"
else:
decision = "Inconclusive -- collect more data"
print(decision)