Discrete changes for binary / categorical regressors¶
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from pymargins import Margins
rng = np.random.default_rng(42)
n = 2000
df = pd.DataFrame({
"age": rng.integers(20, 75, n),
"female": rng.binomial(1, 0.52, n),
"treated": rng.binomial(1, 0.40, n),
"region": rng.choice(["N", "S", "E", "W"], n),
})
lp = (-1.5 + 0.04 * df["age"] - 0.3 * df["female"] + 0.8 * df["treated"]
+ 0.2 * (df["region"] == "S") + 0.4 * (df["region"] == "E")
- 0.1 * (df["region"] == "W"))
df["y"] = rng.binomial(1, 1 / (1 + np.exp(-lp)))
fit = smf.glm("y ~ age + female + treated + C(region)", data=df,
family=sm.families.Binomial()).fit()
m = Margins.log_scale(fit, at="overall")
dydx on a binary regressor is a derivative — it evaluates the slope
at the midpoint of the 0→1 jump (or averages midpoints over the
sample). That is almost never the quantity you want. The correct
quantity is the discrete change: the predicted difference between
the two levels, holding everything else constant.
Binary regressor (0 → 1)¶
Use contrasts with the pairwise helper:
from pymargins import Margins, pairwise
m = Margins.linear_scale(fit, at="overall")
scen, w = pairwise("treated", [1, 0])
print(m.contrasts(scenarios=scen, contrasts=w).summary())
=============================================================
Margins Result (delta, level=0.95)
=============================================================
estimate std err z P>|z| [95% Conf. Int.]
-------------------------------------------------------------
treated=1 0.1686 0.0200 8.4506 0.000 0.1295, 0.2078
=============================================================
n = 2000
κ: 0.031
Delta-vs-sim disagreement: 1.516%
Multi-level factor (each level vs baseline)¶
from pymargins import reference
scen, W = reference("region", ["N", "S", "E", "W"], ref_level="N")
print(m.contrasts(scenarios=scen, contrasts=W).summary())
===========================================================
Margins Result (delta, level=0.95)
===========================================================
estimate std err z P>|z| [95% Conf. Int.]
-----------------------------------------------------------
S_vs_N 0.0305 0.0287 1.0613 0.289 -0.0258, 0.0868
E_vs_N 0.0763 0.0277 2.7596 0.006 0.0221, 0.1306
W_vs_N -0.0093 0.0290 -0.3194 0.749 -0.0661, 0.0476
===========================================================
n = 2000
κ: max=0.026
Delta-vs-sim disagreement: 32.048%
All-pairs comparisons¶
All-pairs are returned in one result with a joint covariance, enabling post-hoc combination and testing:
from pymargins import all_pairwise
scen, W = all_pairwise("region", ["N", "S", "E", "W"])
res = m.contrasts(scenarios=scen, contrasts=W)
print(res.summary())
=========================================================================
Margins Result (delta, level=0.95)
=========================================================================
estimate std err z P>|z| [95% Conf. Int.]
-------------------------------------------------------------------------
region=S_vs_region=N 0.0305 0.0287 1.0613 0.289 -0.0258, 0.0868
region=E_vs_region=N 0.0763 0.0277 2.7596 0.006 0.0221, 0.1306
region=W_vs_region=N -0.0093 0.0290 -0.3194 0.749 -0.0661, 0.0476
=========================================================================
n = 2000
κ: max=0.026
Delta-vs-sim disagreement: 9.296%