Linear contrasts with contrasts¶
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from pymargins import Margins
rng = np.random.default_rng(42)
n = 2000
df = pd.DataFrame({
"age": rng.integers(20, 75, n),
"female": rng.binomial(1, 0.52, n),
"treated": rng.binomial(1, 0.40, n),
"region": rng.choice(["N", "S", "E", "W"], n),
"group": rng.choice(["A", "B"], n),
"preexist": rng.binomial(1, 0.4, n),
})
lp = (-1.5 + 0.04 * df["age"] - 0.3 * df["female"] + 0.8 * df["treated"]
+ 0.2 * (df["region"] == "S") + 0.4 * (df["region"] == "E")
- 0.1 * (df["region"] == "W")
+ 0.6 * (df["group"] == "B") + 0.4 * df["preexist"])
df["y"] = rng.binomial(1, 1 / (1 + np.exp(-lp)))
fit = smf.glm("y ~ age + female + treated + C(region) + C(group) + preexist", data=df,
family=sm.families.Binomial()).fit()
m = Margins.log_scale(fit, at="overall")
Margins.contrasts forms a weighted sum of scenario predictions on the
inference scale. It is the workhorse for risk differences, risk
ratios, odds ratios, lift, reference-level comparisons, and
difference-in-differences — any estimand that is a linear combination
of predictions.
For nonlinear compositions (ratios on the raw scale, NNT, reciprocals,
custom utility functions) use evaluate. See
Contrasts vs evaluate — choosing the right tool for the decision rule.
How contrasts works¶
contrasts takes a list of scenarios and a weight vector (or
matrix). The engine:
Computes the inference-scale prediction for each scenario:
hᵢ = φ⁻¹( mean_predict(beta, scenario_i) ).Forms the weighted sum:
Σᵢ wᵢ · hᵢ.Runs delta-method inference (or simulation/bootstrap if κ is high).
Back-transforms CI endpoints with
phifor reporting.
Mathematically:
result = φ( Σᵢ wᵢ · φ⁻¹(pᵢ) )
where pᵢ is the aggregated response-scale prediction for scenario i.
Because the inference is on a linear combination of φ⁻¹(pᵢ), the
delta method is exact to the extent that the individual hᵢ are
locally linear. This is why simple contrasts usually have smaller κ
than evaluate calls.
Risk difference (linear scale)¶
The simplest contrast: the arithmetic difference between two predicted probabilities.
from pymargins import Margins, pairwise
m = Margins.linear_scale(fit, at="overall")
scen, w = pairwise("treated", [1, 0])
res = m.contrasts(scenarios=scen, contrasts=w)
print(res.summary())
=============================================================
Margins Result (delta, level=0.95)
=============================================================
estimate std err z P>|z| [95% Conf. Int.]
-------------------------------------------------------------
treated=1 0.1492 0.0185 8.0667 0.000 0.1129, 0.1854
=============================================================
n = 2000
κ: 0.039
Delta-vs-sim disagreement: 1.543%
On the linear scale the contrast is p₁ − p₀. The CI is symmetric on
the probability scale.
Risk ratio (log scale)¶
A ratio is a difference on the log scale. The back-transform turns the log-ratio into a ratio with an asymmetric CI.
m = Margins.log_scale(fit, at="overall")
scen, w = pairwise("treated", [1, 0])
res = m.contrasts(scenarios=scen, contrasts=w)
print(res.summary())
=============================================================
Margins Result (delta, level=0.95)
=============================================================
estimate std err z P>|z| [95% Conf. Int.]
-------------------------------------------------------------
treated=1 1.2247 0.0253 8.0232 0.000 1.1656, 1.2869
=============================================================
n = 2000
Note: std err is on the inference scale; estimate and CI are on the reporting scale.
κ: 0.046
Delta-vs-sim disagreement: 0.354%
The point estimate is exp(log(p₁) − log(p₀)) = p₁ / p₀. Because the
inference is on the log scale, the delta method is exact and κ is
small.
Odds ratio (logit scale)¶
For probabilities near 0 or 1, the logit scale keeps the CI inside (0, 1) for each arm before forming the odds ratio.
m = Margins.logit_scale(fit, at="overall")
scen, w = pairwise("treated", [1, 0])
res = m.contrasts(scenarios=scen, contrasts=w)
print(res.summary())
=============================================================
Margins Result (delta, level=0.95)
=============================================================
estimate std err z P>|z| [95% Conf. Int.]
-------------------------------------------------------------
treated=1 0.6877 0.1053 7.4956 0.000 0.6417, 0.7302
=============================================================
n = 2000
Note: std err is on the inference scale; estimate and CI are on the reporting scale.
κ: 0.017
Delta-vs-sim disagreement: 0.172%
The back-transform is expit(logit(p₁) − logit(p₀)), which simplifies
to the odds ratio (p₁/(1−p₁)) / (p₀/(1−p₀)).
Lift (RR − 1)¶
Lift is a risk ratio minus one. The easiest path is log_scale for
the ratio, then subtract one from the estimate and CI endpoints:
m = Margins.log_scale(fit, at="overall")
res = m.contrasts(scenarios=scen, contrasts=w)
lift_est = float(res.estimate) - 1.0
lift_ci = (float(res.conf_int_lower) - 1.0, float(res.conf_int_upper) - 1.0)
Reference-level contrasts¶
Compare every level of a factor against a common baseline. The weight matrix has one row per comparison.
from pymargins import reference
scen, W = reference("region", ["N", "S", "E", "W"], ref_level="N")
res = m.contrasts(scenarios=scen, contrasts=W)
print(res.summary())
===========================================================
Margins Result (delta, level=0.95)
===========================================================
estimate std err z P>|z| [95% Conf. Int.]
-----------------------------------------------------------
S_vs_N 0.9850 0.0381 -0.3969 0.691 0.9142, 1.0613
E_vs_N 1.0892 0.0341 2.5057 0.012 1.0188, 1.1646
W_vs_N 0.9357 0.0395 -1.6794 0.093 0.8660, 1.0112
===========================================================
n = 2000
Note: std err is on the inference scale; estimate and CI are on the reporting scale.
κ: max=0.052
Delta-vs-sim disagreement: 0.513%
All-pairs comparisons¶
Compare every level against every other level. The result carries a joint covariance, so simultaneous CIs are available.
from pymargins import all_pairwise
scen, W = all_pairwise("region", ["N", "S", "E", "W"])
res = m.contrasts(scenarios=scen, contrasts=W)
# Joint covariance is stored in the result for post-hoc combination
print(res.summary())
=========================================================================
Margins Result (delta, level=0.95)
=========================================================================
estimate std err z P>|z| [95% Conf. Int.]
-------------------------------------------------------------------------
region=S_vs_region=N 0.9850 0.0381 -0.3969 0.691 0.9142, 1.0613
region=E_vs_region=N 1.0892 0.0341 2.5057 0.012 1.0188, 1.1646
region=W_vs_region=N 0.9357 0.0395 -1.6794 0.093 0.8660, 1.0112
=========================================================================
n = 2000
Note: std err is on the inference scale; estimate and CI are on the reporting scale.
κ: max=0.052
Delta-vs-sim disagreement: 0.411%
Testing a non-zero null¶
# Test whether the risk ratio exceeds 1.5
scen_test, w_test = pairwise("treated", [1, 0])
print(m.log_scale(fit, at="overall").contrasts(scenarios=scen_test, contrasts=w_test).test(value=1.5).summary())
Hypothesis test (wald)
H0: estimand = 1.5
Alternative: two-sided
Statistic: -8.0238
P-value: 1.11e-15
The null value is interpreted on the reporting scale and lifted
onto the inference scale via phi_inv automatically.
Predictions over a grid¶
Evaluate predictions at several values of a moderator:
res = m.predict(atexog={"age": [25, 45, 65], "treated": [0, 1]})
print(res.summary())
=======================================================================
Margins Result (delta, level=0.95)
=======================================================================
estimate std err z P>|z| [95% Conf. Int.]
-----------------------------------------------------------------------
age=25, treated=0 0.4721 0.0462 -16.2417 0.000 0.4312, 0.5168
age=25, treated=1 0.6728 0.0352 -11.2429 0.000 0.6279, 0.7209
age=45, treated=0 0.6590 0.0208 -20.0132 0.000 0.6326, 0.6865
age=45, treated=1 0.8178 0.0172 -11.6708 0.000 0.7907, 0.8459
age=65, treated=0 0.8083 0.0181 -11.7357 0.000 0.7800, 0.8375
age=65, treated=1 0.9085 0.0115 -8.3763 0.000 0.8883, 0.9291
=======================================================================
n = 2000
Note: std err is on the inference scale; estimate and CI are on the reporting scale.
κ: max=0.112
Delta-vs-sim disagreement: 0.621%
For a contrast at a specific moderator value, build the scenarios explicitly:
scen, w = pairwise("treated", [1, 0])
res = m.contrasts(
scenarios=[
{"atexog": {"treated": 1, "age": 45}, "label": "treated@45"},
{"atexog": {"treated": 0, "age": 45}, "label": "control@45"},
],
contrasts=w,
)
print(res.summary())
==============================================================
Margins Result (delta, level=0.95)
==============================================================
estimate std err z P>|z| [95% Conf. Int.]
--------------------------------------------------------------
treated@45 1.2411 0.0270 8.0112 0.000 1.1772, 1.3084
==============================================================
n = 2000
Note: std err is on the inference scale; estimate and CI are on the reporting scale.
κ: 0.048
Delta-vs-sim disagreement: 0.405%
2×2 Difference-in-differences¶
DiD is a contrast across four cells with weights [+1, −1, −1, +1].
See Difference-in-differences on the response scale for the full derivation and
response-scale motivation (Ai & Norton, 2003).
from pymargins import did
scen, w = did("group", "preexist",
treated_level="B", control_level="A",
post_level=1, pre_level=0)
print(m.contrasts(scenarios=scen, contrasts=w).summary())
========================================================
Margins Result (delta, level=0.95)
========================================================
estimate std err z P>|z| [95% Conf. Int.]
--------------------------------------------------------
did 0.9542 0.0125 -3.7399 0.000 0.9311, 0.9780
========================================================
n = 2000
Note: std err is on the inference scale; estimate and CI are on the reporting scale.
κ: max=0.140
Delta-vs-sim disagreement: 0.276%
See also¶
Scenario helpers for
pairwise,reference,grid, etc.Difference-in-differences on the response scale for DiD theory and examples.
Contrasts vs evaluate — choosing the right tool for choosing between
contrastsandevaluate.Mathematical motivation for the delta-method derivation.