--- jupytext: text_representation: extension: .md format_name: myst format_version: 0.13 kernelspec: display_name: Python 3 language: python name: python3 --- # Linear contrasts with `contrasts` ```{code-cell} python import numpy as np import pandas as pd import statsmodels.api as sm import statsmodels.formula.api as smf from pymargins import Margins rng = np.random.default_rng(42) n = 2000 df = pd.DataFrame({ "age": rng.integers(20, 75, n), "female": rng.binomial(1, 0.52, n), "treated": rng.binomial(1, 0.40, n), "region": rng.choice(["N", "S", "E", "W"], n), "group": rng.choice(["A", "B"], n), "preexist": rng.binomial(1, 0.4, n), }) lp = (-1.5 + 0.04 * df["age"] - 0.3 * df["female"] + 0.8 * df["treated"] + 0.2 * (df["region"] == "S") + 0.4 * (df["region"] == "E") - 0.1 * (df["region"] == "W") + 0.6 * (df["group"] == "B") + 0.4 * df["preexist"]) df["y"] = rng.binomial(1, 1 / (1 + np.exp(-lp))) fit = smf.glm("y ~ age + female + treated + C(region) + C(group) + preexist", data=df, family=sm.families.Binomial()).fit() m = Margins.log_scale(fit, at="overall") ``` `Margins.contrasts` forms a **weighted sum of scenario predictions on the inference scale**. It is the workhorse for risk differences, risk ratios, odds ratios, lift, reference-level comparisons, and difference-in-differences — any estimand that is a linear combination of predictions. For nonlinear compositions (ratios on the raw scale, NNT, reciprocals, custom utility functions) use `evaluate`. See [](../howto/contrasts_vs_evaluate.md) for the decision rule. ## How `contrasts` works `contrasts` takes a list of **scenarios** and a **weight vector** (or matrix). The engine: 1. Computes the inference-scale prediction for each scenario: `hᵢ = φ⁻¹( mean_predict(beta, scenario_i) )`. 2. Forms the weighted sum: `Σᵢ wᵢ · hᵢ`. 3. Runs delta-method inference (or simulation/bootstrap if κ is high). 4. Back-transforms CI endpoints with `phi` for reporting. Mathematically: ``` result = φ( Σᵢ wᵢ · φ⁻¹(pᵢ) ) ``` where `pᵢ` is the aggregated response-scale prediction for scenario `i`. Because the inference is on a **linear combination** of `φ⁻¹(pᵢ)`, the delta method is exact to the extent that the individual `hᵢ` are locally linear. This is why simple contrasts usually have smaller κ than `evaluate` calls. ## Risk difference (linear scale) The simplest contrast: the arithmetic difference between two predicted probabilities. ```{code-cell} python from pymargins import Margins, pairwise m = Margins.linear_scale(fit, at="overall") scen, w = pairwise("treated", [1, 0]) res = m.contrasts(scenarios=scen, contrasts=w) print(res.summary()) ``` On the linear scale the contrast is `p₁ − p₀`. The CI is symmetric on the probability scale. ## Risk ratio (log scale) A ratio is a difference on the log scale. The back-transform turns the log-ratio into a ratio with an asymmetric CI. ```{code-cell} python m = Margins.log_scale(fit, at="overall") scen, w = pairwise("treated", [1, 0]) res = m.contrasts(scenarios=scen, contrasts=w) print(res.summary()) ``` The point estimate is `exp(log(p₁) − log(p₀)) = p₁ / p₀`. Because the inference is on the log scale, the delta method is exact and κ is small. ## Odds ratio (logit scale) For probabilities near 0 or 1, the logit scale keeps the CI inside (0, 1) for each arm before forming the odds ratio. ```{code-cell} python m = Margins.logit_scale(fit, at="overall") scen, w = pairwise("treated", [1, 0]) res = m.contrasts(scenarios=scen, contrasts=w) print(res.summary()) ``` The back-transform is `expit(logit(p₁) − logit(p₀))`, which simplifies to the odds ratio `(p₁/(1−p₁)) / (p₀/(1−p₀))`. ## Lift (RR − 1) Lift is a risk ratio minus one. The easiest path is `log_scale` for the ratio, then subtract one from the estimate and CI endpoints: ```{code-cell} python m = Margins.log_scale(fit, at="overall") res = m.contrasts(scenarios=scen, contrasts=w) lift_est = float(res.estimate) - 1.0 lift_ci = (float(res.conf_int_lower) - 1.0, float(res.conf_int_upper) - 1.0) ``` ## Reference-level contrasts Compare every level of a factor against a common baseline. The weight matrix has one row per comparison. ```{code-cell} python from pymargins import reference scen, W = reference("region", ["N", "S", "E", "W"], ref_level="N") res = m.contrasts(scenarios=scen, contrasts=W) print(res.summary()) ``` ## All-pairs comparisons Compare every level against every other level. The result carries a joint covariance, so simultaneous CIs are available. ```{code-cell} python from pymargins import all_pairwise scen, W = all_pairwise("region", ["N", "S", "E", "W"]) res = m.contrasts(scenarios=scen, contrasts=W) # Joint covariance is stored in the result for post-hoc combination print(res.summary()) ``` ## Testing a non-zero null ```{code-cell} python # Test whether the risk ratio exceeds 1.5 scen_test, w_test = pairwise("treated", [1, 0]) print(m.log_scale(fit, at="overall").contrasts(scenarios=scen_test, contrasts=w_test).test(value=1.5).summary()) ``` The `null` value is interpreted on the **reporting scale** and lifted onto the inference scale via `phi_inv` automatically. ## Predictions over a grid Evaluate predictions at several values of a moderator: ```{code-cell} python res = m.predict(atexog={"age": [25, 45, 65], "treated": [0, 1]}) print(res.summary()) ``` For a contrast at a specific moderator value, build the scenarios explicitly: ```{code-cell} python scen, w = pairwise("treated", [1, 0]) res = m.contrasts( scenarios=[ {"atexog": {"treated": 1, "age": 45}, "label": "treated@45"}, {"atexog": {"treated": 0, "age": 45}, "label": "control@45"}, ], contrasts=w, ) print(res.summary()) ``` ## 2×2 Difference-in-differences DiD is a contrast across four cells with weights `[+1, −1, −1, +1]`. See [](../howto/diff_in_diff.md) for the full derivation and response-scale motivation (Ai & Norton, 2003). ```{code-cell} python from pymargins import did scen, w = did("group", "preexist", treated_level="B", control_level="A", post_level=1, pre_level=0) print(m.contrasts(scenarios=scen, contrasts=w).summary()) ``` ## See also - [](../howto/scenarios_helpers.md) for `pairwise`, `reference`, `grid`, etc. - [](../howto/diff_in_diff.md) for DiD theory and examples. - [](../howto/contrasts_vs_evaluate.md) for choosing between `contrasts` and `evaluate`. - [](../math.rst) for the delta-method derivation.