Discrete changes for binary / categorical regressors¶

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from pymargins import Margins

rng = np.random.default_rng(42)
n = 2000
df = pd.DataFrame({
    "age": rng.integers(20, 75, n),
    "female": rng.binomial(1, 0.52, n),
    "treated": rng.binomial(1, 0.40, n),
    "region": rng.choice(["N", "S", "E", "W"], n),
})
lp = (-1.5 + 0.04 * df["age"] - 0.3 * df["female"] + 0.8 * df["treated"]
      + 0.2 * (df["region"] == "S") + 0.4 * (df["region"] == "E")
      - 0.1 * (df["region"] == "W"))
df["y"] = rng.binomial(1, 1 / (1 + np.exp(-lp)))

fit = smf.glm("y ~ age + female + treated + C(region)", data=df,
              family=sm.families.Binomial()).fit()
m = Margins.log_scale(fit, at="overall")

dydx on a binary regressor is a derivative — it evaluates the slope at the midpoint of the 0→1 jump (or averages midpoints over the sample). That is almost never the quantity you want. The correct quantity is the discrete change: the predicted difference between the two levels, holding everything else constant.

Binary regressor (0 → 1)¶

Use contrasts with the pairwise helper:

from pymargins import Margins, pairwise

m = Margins.linear_scale(fit, at="overall")

scen, w = pairwise("treated", [1, 0])
print(m.contrasts(scenarios=scen, contrasts=w).summary())

=============================================================
              Margins Result (delta, level=0.95)             
=============================================================
           estimate  std err       z  P>|z|  [95% Conf. Int.]
-------------------------------------------------------------
treated=1    0.1686   0.0200  8.4506  0.000    0.1295, 0.2078
=============================================================

n = 2000
κ: 0.031
Delta-vs-sim disagreement: 0.773%

Multi-level factor (each level vs baseline)¶

from pymargins import reference

scen, W = reference("region", ["N", "S", "E", "W"], ref_level="N")
print(m.contrasts(scenarios=scen, contrasts=W).summary())

===========================================================
             Margins Result (delta, level=0.95)            
===========================================================
        estimate  std err        z  P>|z|  [95% Conf. Int.]
-----------------------------------------------------------
S_vs_N    0.0305   0.0287   1.0613  0.289   -0.0258, 0.0868
E_vs_N    0.0763   0.0277   2.7596  0.006    0.0221, 0.1306
W_vs_N   -0.0093   0.0290  -0.3194  0.749   -0.0661, 0.0476
===========================================================

n = 2000
κ: max=0.026
Delta-vs-sim disagreement: 18.885%

All-pairs comparisons¶

All-pairs are returned in one result with a joint covariance, enabling post-hoc combination and testing:

from pymargins import all_pairwise

scen, W = all_pairwise("region", ["N", "S", "E", "W"])
res = m.contrasts(scenarios=scen, contrasts=W)
print(res.summary())

=========================================================================
                    Margins Result (delta, level=0.95)                   
=========================================================================
                      estimate  std err        z  P>|z|  [95% Conf. Int.]
-------------------------------------------------------------------------
region=S_vs_region=N    0.0305   0.0287   1.0613  0.289   -0.0258, 0.0868
region=E_vs_region=N    0.0763   0.0277   2.7596  0.006    0.0221, 0.1306
region=W_vs_region=N   -0.0093   0.0290  -0.3194  0.749   -0.0661, 0.0476
=========================================================================

n = 2000
κ: max=0.026
Delta-vs-sim disagreement: 30.481%