Variance Reduction¶

Lower variance means smaller required sample sizes and faster experiments. splita provides 14 variance reduction methods.

CUPED¶

Controlled-experiment Using Pre-Experiment Data. The most widely used variance reduction technique at companies like Microsoft, Booking.com, and Netflix.

from splita.variance import CUPED
from splita import Experiment
import numpy as np

rng = np.random.default_rng(42)

# Pre-experiment data (e.g., last week's page views)
pre_ctrl = rng.normal(10, 2, size=1000)
pre_trt = rng.normal(10, 2, size=1000)

# Experiment data (correlated with pre-experiment)
ctrl = pre_ctrl + rng.normal(0, 1, 1000)
trt = pre_trt + 0.5 + rng.normal(0, 1, 1000)

# Apply CUPED
cuped = CUPED()
ctrl_adj, trt_adj = cuped.fit_transform(ctrl, trt, pre_ctrl, pre_trt)
print(f"Variance reduction: {cuped.variance_reduction_:.0%}")  # ~75%

# Run the test on adjusted data
result = Experiment(ctrl_adj, trt_adj).run()

Note

CUPED works best when the pre-experiment covariate is highly correlated with the outcome. Typical reductions range from 30-80%.

CUPAC¶

ML-predicted covariate adjustment. Uses cross-validated ML models to predict the outcome from covariates, then adjusts using the predictions. Requires pip install splita[ml].

from splita.variance import CUPAC
import numpy as np

rng = np.random.default_rng(42)
n = 2000

# Multiple covariates as a feature matrix
X_ctrl = rng.normal(0, 1, (n, 5))
X_trt = rng.normal(0, 1, (n, 5))

ctrl = X_ctrl @ rng.normal(1, 0.5, 5) + rng.normal(0, 1, n)
trt = X_trt @ rng.normal(1, 0.5, 5) + 0.5 + rng.normal(0, 1, n)

cupac = CUPAC()
ctrl_adj, trt_adj = cupac.fit_transform(ctrl, trt, X_ctrl, X_trt)
print(f"Variance reduction: {cupac.variance_reduction_:.0%}")

OutlierHandler¶

Outliers inflate variance. Handle them before analysis.

from splita.variance import OutlierHandler

handler = OutlierHandler(method='winsorize')
ctrl_clean, trt_clean = handler.fit_transform(ctrl, trt)

Available methods:

Method	Description
`'winsorize'`	Cap extreme values at percentile thresholds
`'trim'`	Remove extreme values entirely
`'iqr'`	Cap based on IQR fences

Warning

Always fit on pooled data (both groups together) to avoid introducing bias. fit_transform() handles this automatically.

MultivariateCUPED¶

Extension of CUPED for multiple covariates without requiring ML:

from splita.variance import MultivariateCUPED
import numpy as np

rng = np.random.default_rng(42)
n = 1000

ctrl = rng.normal(25, 8, n)
trt = rng.normal(26, 8, n)

# Multiple pre-experiment covariates
pre_ctrl = rng.normal(0, 1, (n, 3))
pre_trt = rng.normal(0, 1, (n, 3))

mcuped = MultivariateCUPED()
ctrl_adj, trt_adj = mcuped.fit_transform(ctrl, trt, pre_ctrl, pre_trt)

AdaptiveWinsorizer¶

Automatically finds the optimal capping thresholds via grid search:

from splita.variance import AdaptiveWinsorizer

winsorizer = AdaptiveWinsorizer()
ctrl_clean, trt_clean = winsorizer.fit_transform(ctrl, trt)
print(f"Optimal lower: {winsorizer.lower_percentile_}")
print(f"Optimal upper: {winsorizer.upper_percentile_}")

RegressionAdjustment¶

Lin's regression adjustment with HC2 robust standard errors:

from splita.variance import RegressionAdjustment
import numpy as np

rng = np.random.default_rng(42)
n = 500

ctrl = rng.normal(25, 8, n)
trt = rng.normal(26, 8, n)
x_ctrl = rng.normal(0, 1, n)
x_trt = rng.normal(0, 1, n)

ra = RegressionAdjustment()
result = ra.fit(ctrl, trt, x_ctrl, x_trt)
print(result.ate)          # adjusted treatment effect
print(result.ci)           # confidence interval
print(result.pvalue)

DoubleML¶

Double/debiased machine learning for treatment effect estimation:

from splita.variance import DoubleML
import numpy as np

rng = np.random.default_rng(42)
n = 1000

X_ctrl = rng.normal(0, 1, (n, 5))
X_trt = rng.normal(0, 1, (n, 5))
ctrl = X_ctrl @ rng.normal(1, 0.5, 5) + rng.normal(0, 1, n)
trt = X_trt @ rng.normal(1, 0.5, 5) + 0.5 + rng.normal(0, 1, n)

dml = DoubleML()
result = dml.fit(ctrl, trt, X_ctrl, X_trt)
print(result.ate)
print(result.ci)

The auto() function¶

The top-level auto() function automatically selects and applies the best variance reduction strategy:

from splita import auto
import numpy as np

rng = np.random.default_rng(42)
ctrl = rng.normal(25, 8, 1000)
trt = rng.normal(26, 8, 1000)
pre_ctrl = rng.normal(10, 2, 1000)
pre_trt = rng.normal(10, 2, 1000)

result = auto(ctrl, trt, pre_control=pre_ctrl, pre_treatment=pre_trt)
print(result)

All 14 methods¶

Class	Description	When to use
`CUPED`	Pre-experiment covariate adjustment	Single covariate, most common
`CUPAC`	ML-predicted adjustment	Multiple covariates, non-linear relationships
`OutlierHandler`	Winsorize/trim/IQR outliers	Heavy-tailed metrics (revenue)
`MultivariateCUPED`	Multi-covariate CUPED	Multiple covariates, no ML needed
`RegressionAdjustment`	Lin's OLS with HC2 SEs	Linear covariate relationships
`AdaptiveWinsorizer`	Auto-tuned capping	When you don't know the right percentiles
`DoubleML`	Double/debiased ML	High-dimensional confounders
`ClusterBootstrap`	Cluster-level bootstrap	Within-cluster correlation
`InExperimentVR`	In-experiment control covariates	No pre-experiment data available
`NonstationaryAdjustment`	Time-series decomposition	Non-stationary treatment effects
`PostStratification`	Post-experiment stratification	Known population strata
`PredictionPoweredInference`	ML predictions + small labels	Limited labeled data
`RobustMeanEstimator`	Huber/Catoni/MoM estimators	Extremely heavy tails
`TrimmedMeanEstimator`	Symmetric tail trimming	Symmetric outlier distributions