Effects by subgroup with over=¶
over= partitions the sample and computes the estimand within each
group, returning one row per group with a shared covariance. It works
on predict and dydx (the aggregating estimands); for discrete
contrasts by subgroup, pin the group in the scenario instead (last
section).
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from pymargins import Margins
rng = np.random.default_rng(42)
n = 4000
df = pd.DataFrame({
"age": rng.integers(20, 75, n),
"treated": rng.binomial(1, 0.5, n),
"region": rng.choice(["N", "S", "E", "W"], n),
})
lp = -3 + 0.04 * df["age"] + 0.9 * df["treated"] + 0.05 * df["age"] * df["treated"]
df["y"] = rng.binomial(1, 1 / (1 + np.exp(-lp)))
fit = smf.glm("y ~ age * treated + C(region)", data=df,
family=sm.families.Binomial()).fit()
m = Margins.linear_scale(fit, at="overall")
AME within each subgroup¶
Pass the grouping column name. Each row is the average marginal effect computed over the rows in that group only:
print(m.dydx("age", over="treated").summary())
===================================================================
Margins Result (delta, level=0.95)
===================================================================
estimate std err z P>|z| [95% Conf. Int.]
-------------------------------------------------------------------
treated=0, age 0.0079 0.0004 18.5301 0.000 0.0071, 0.0088
treated=1, age 0.0106 0.0006 16.4399 0.000 0.0093, 0.0118
===================================================================
n = 4000
κ: max=0.042
Delta-vs-sim disagreement: 3.594%
The slope of age differs between the treated and untreated groups —
here because the model contains an age × treated interaction.
Predicted values within each subgroup¶
predict takes over= the same way. Combine it with atexog to sweep
a variable inside each group:
res = m.predict(atexog={"treated": [0, 1]}, over="region")
print(res.to_frame()[["region", "treated", "estimate", "std_error"]])
region treated estimate std_error
0 E 0 0.247868 0.015113
1 E 1 0.816881 0.011472
2 N 0 0.289799 0.016525
3 N 1 0.849118 0.010360
4 S 0 0.275311 0.016031
5 S 1 0.838011 0.010589
6 W 0 0.252808 0.015609
7 W 1 0.825545 0.011365
Other covariates are averaged within each region, so the rows are region-specific adjusted predictions rather than a single pooled profile.
Crossing several grouping variables¶
over= accepts a list. The sample is partitioned by the full cross of
the listed columns, one row per non-empty cell:
print(m.dydx("age", over=["treated", "region"]).summary())
=============================================================================
Margins Result (delta, level=0.95)
=============================================================================
estimate std err z P>|z| [95% Conf. Int.]
-----------------------------------------------------------------------------
treated=0, region=E, age 0.0076 0.0006 12.2140 0.000 0.0064, 0.0088
treated=0, region=N, age 0.0084 0.0006 13.3750 0.000 0.0072, 0.0096
treated=0, region=S, age 0.0081 0.0007 11.4207 0.000 0.0067, 0.0095
treated=0, region=W, age 0.0078 0.0006 13.2023 0.000 0.0066, 0.0089
treated=1, region=E, age 0.0109 0.0006 18.7318 0.000 0.0097, 0.0120
treated=1, region=N, age 0.0100 0.0007 15.3103 0.000 0.0088, 0.0113
treated=1, region=S, age 0.0104 0.0005 20.6770 0.000 0.0094, 0.0114
treated=1, region=W, age 0.0110 0.0006 17.3318 0.000 0.0098, 0.0122
=============================================================================
n = 4000
κ: max=0.087
Delta-vs-sim disagreement: 3.137%
Nonlinear models give heterogeneity for free¶
In a linear model with no interaction, every subgroup AME is identical
— over= only changes the baseline, not the effect. In a nonlinear
model the marginal effect depends on where each subgroup sits on the
response curve, so subgroup AMEs differ even without an interaction
term:
# Drop the interaction: age enters additively, treated only shifts the level.
fit_add = smf.glm("y ~ age + treated + C(region)", data=df,
family=sm.families.Binomial()).fit()
m_add = Margins.linear_scale(fit_add, at="overall")
print(m_add.dydx("age", over="treated").summary())
===================================================================
Margins Result (delta, level=0.95)
===================================================================
estimate std err z P>|z| [95% Conf. Int.]
-------------------------------------------------------------------
treated=0, age 0.0102 0.0003 inf 0.000 0.0096, 0.0108
treated=1, age 0.0077 0.0005 15.2113 0.000 0.0067, 0.0087
===================================================================
n = 4000
κ: max=0.041
Delta-vs-sim disagreement: 4.106%
The two slopes still differ: the treated group sits higher on the
logistic curve, where the same change in the linear predictor maps to a
different change in probability. This is genuine link-driven
heterogeneity, not a modelling artefact — but note it is a property of
the probability scale. On a log_scale or logit_scale session the
subgroup effects of an additive model would instead coincide.
Discrete contrasts by subgroup¶
contrasts does not take over= — a contrast is a single linear
combination, not an aggregation. To get a contrast within each
subgroup, pin the group in the scenarios (the grouping variable must be
a model regressor) and request all groups’ contrasts at once via a
named-contrast dict, which gives them a joint covariance:
from itertools import product
regions = ["N", "S", "E", "W"]
scenarios = []
for r in regions:
scenarios.append({"atexog": {"treated": 1, "region": r}, "label": f"{r}:treated"})
scenarios.append({"atexog": {"treated": 0, "region": r}, "label": f"{r}:control"})
weights = {}
for i, r in enumerate(regions):
w = [0] * len(scenarios)
w[2 * i], w[2 * i + 1] = 1, -1
weights[f"effect[{r}]"] = w
print(m.contrasts(scenarios=scenarios, contrasts=weights).summary())
==============================================================
Margins Result (delta, level=0.95)
==============================================================
estimate std err z P>|z| [95% Conf. Int.]
--------------------------------------------------------------
effect[N] 0.5567 0.0133 41.7240 0.000 0.5305, 0.5828
effect[S] 0.5627 0.0132 42.7280 0.000 0.5369, 0.5885
effect[E] 0.5732 0.0125 45.8470 0.000 0.5487, 0.5977
effect[W] 0.5711 0.0127 45.0053 0.000 0.5462, 0.5960
==============================================================
n = 4000
κ: max=0.075
Delta-vs-sim disagreement: 0.347%
Because the contrasts share a covariance, you can follow up with a joint test that the subgroup effects are equal, or add a difference-of-contrasts row to read a specific between-group gap directly — see Linear contrasts with contrasts and the union-premium demo.
Where to next¶
Union premium heterogeneity — effects by subgroup — a full worked subgroup analysis (union premium by education) on real panel data.
Linear contrasts with contrasts — the contrast primitive used for per-subgroup effects and their differences.
Grid predictions — sweeping a covariate grid, which composes with
over=for subgroup-by-grid surfaces.