--- jupytext: text_representation: extension: .md format_name: myst format_version: 0.13 kernelspec: display_name: Python 3 language: python name: python3 --- # Contrasts and difference-in-differences Pairwise contrasts, reference-level contrasts, and 2×2 DiD all reduce to a list of scenarios plus a contrast weight vector. ```{code-cell} python import numpy as np import pandas as pd import statsmodels.api as sm import statsmodels.formula.api as smf from pymargins import Margins, pairwise, reference, did rng = np.random.default_rng(8) n = 4000 df = pd.DataFrame({ "group": rng.choice(["A", "B"], n), "preexist": rng.binomial(1, 0.4, n), "age": rng.integers(20, 80, n), }) lp = (-1.5 + 0.6 * (df["group"] == "B") + 0.4 * df["preexist"] + 0.5 * ((df["group"] == "B") & (df["preexist"] == 1)) + 0.02 * df["age"]) df["condX"] = rng.binomial(1, 1 / (1 + np.exp(-lp))) fit = smf.glm( "condX ~ C(group) * C(preexist) + age", data=df, family=sm.families.Binomial(), ).fit() m = Margins.linear_scale(fit, at="overall") ``` ## Pairwise risk difference ```{code-cell} python scen, w = pairwise("group", ["B", "A"]) print(m.contrasts(scenarios=scen, contrasts=w).summary()) ``` ## Reference-level contrasts for a factor ```{code-cell} python scen, w_mat = reference( "group", levels=["A", "B"], ref_level="A" ) print(m.contrasts(scenarios=scen, contrasts=w_mat).summary()) ``` ## 2×2 Difference-in-differences (Ai & Norton, 2003) The `did` helper builds four scenarios and a `[+1, -1, -1, +1]` contrast weight, evaluating the DiD on the response scale where the clinical / economic question lives — not on the log-odds scale of the interaction coefficient. ```{code-cell} python scen, w = did("group", "preexist", treated_level="B", control_level="A", post_level=1, pre_level=0) print(m.contrasts(scenarios=scen, contrasts=w).summary()) ``` ## Plot: forest plot of group contrasts ```{code-cell} python import matplotlib.pyplot as plt from pymargins import reference scen, W = reference("group", ["A", "B"], ref_level="A") res = m.contrasts(scenarios=scen, contrasts=W) df_forest = res.to_frame() fig, ax = plt.subplots(figsize=(4, 2)) y = range(len(df_forest)) ax.errorbar( df_forest["estimate"], y, xerr=[df_forest["estimate"] - df_forest["ci_lower"], df_forest["ci_upper"] - df_forest["estimate"]], fmt="o", capsize=3, color="darkorchid" ) ax.axvline(0, color="grey", lw=0.5) ax.set_yticks(list(y)) ax.set_yticklabels(df_forest["label"]) ax.set_xlabel("Risk difference vs Group A") ax.invert_yaxis() ``` See [](../math.rst) for why DiD must be evaluated on the response scale for nonlinear models.