Introduction¶
pymargins computes marginal-effects-style quantities — adjusted
predictions, slopes, contrasts, difference-in-differences, custom
linear and nonlinear combinations of predictions — from fitted
statistical models. Uncertainty is quantified via the delta method,
parametric simulation (Krinsky–Robb), or nonparametric bootstrap.
The design target is Stata’s margins command and
the R package marginaleffects, with one substantive difference:
Margins is a session — once constructed it
commits to an inference scale, variance estimator, confidence level,
default evaluation point, and inference method. Every subsequent
computation inherits those commitments. Switching any of them requires
a new session.
Why marginal effects?¶
A fitted nonlinear model reports on the wrong scale. A logistic regression returns coefficients in log-odds; a Poisson model, in log-counts; a Cox model, in log-hazards. The question an analyst or stakeholder actually has — how much does the predicted probability, count, or survival move? — is a marginal effect, and in a nonlinear model that quantity is not any single coefficient. It is a derivative or discrete contrast that
depends on where in covariate space it is evaluated (the effect of
bmiin a logit differs along the curve),combines several coefficients whenever the model contains interactions, polynomials, or splines, and
carries its own standard error, which hand-calculation usually gets wrong.
pymargins computes that quantity, on the outcome scale, with
uncertainty from the delta method, simulation, or bootstrap:
m = Margins.log_scale(fit, vcov="HC3")
m.dydx("bmi") # AME on P(diabetes)
m.predict(atexog={"age": [25, 45, 65]}) # adjusted predictions
m.contrasts(scenarios=[...], contrasts=[+1, -1]) # risk difference
Reach for marginal effects when the model is nonlinear, when effects
are conditional (interactions / polynomials / splines), when the
audience needs outcome units rather than link-scale coefficients, when
heterogeneity across subgroups is the research question, or when the
answer is a counterfactual contrast (“what if everyone were
treated?”). These correspond exactly to the estimand axis below:
predict(),
dydx(), and
contrasts() /
evaluate().
Why a session?¶
A session forces the analyst to declare their analytical posture
explicitly, in code, at one location. A reviewer sees the entire
methodological commitment in the constructor call. Any change in
posture — different scale, different vcov, different level — shows up
as a new Margins(...) in the audit trail.
The alternative (per-call configuration) is more flexible but invites
the analyst to try multiple scales, multiple vcov flavors, multiple
levels until something looks significant. pymargins makes this
possible but visible: do it, and you’ll have multiple sessions in
your code, each documenting its own posture.
Installation¶
pip install pymargins
Requires Python ≥3.10 and JAX. Dependencies (numpy, pandas,
scipy, jax) are installed automatically; model-specific
backends (statsmodels, lifelines, linearmodels) are
detected at runtime when present.
Quickstart¶
import statsmodels.api as sm
import statsmodels.formula.api as smf
from pymargins import Margins
fit = smf.glm(
"diabetes ~ C(black) + C(female) + C(agegrp) + bmi + age",
data=df,
family=sm.families.Binomial(),
).fit()
# Open a session — log scale, HC3 robust SEs, 95% CIs, delta method.
m = Margins.log_scale(fit, vcov="HC3", level=0.95)
print(m.summary()) # methods-section paragraph
# Pre-flight: is delta reliable here?
print(m.diagnose().summary())
# Adjusted predictions at representative values.
m.predict(atexog={"age": [25, 45, 65]})
# Average marginal effects on the response scale.
m.dydx("age")
# A relative-risk contrast, two scenarios with a [+1, -1] weight.
rr = m.contrasts(
scenarios=[
{"atexog": {"treatment": 1}, "label": "treated"},
{"atexog": {"treatment": 0}, "label": "control"},
],
contrasts=[+1, -1],
)
print(rr.summary())
Each call returns a MarginsResult with
.estimate, .se, .vcov, .ci_lower, .ci_upper,
.pvalue, plus .summary(), .to_frame(), .to_latex(),
and .to_html().
The three orthogonal axes¶
Every call is a point in a three-dimensional design space:
Estimand — what is computed?
predict(),dydx(),contrasts(), orevaluate()(arbitrary differentiable composition).Aggregation — where in covariate space?
at="overall"(per-row, then average → AME / AAP),at="typical"/at="mean"(single representative point → MEM / APM), oratexog={...}(representative grid → MER / APR).Inference — how is uncertainty quantified?
method="delta",method="simulation", ormethod="bootstrap".
These are orthogonal. Any combination is meaningful. The session fixes
the scale, vcov, level, and default at / method; per-call
keywords pick the estimand and override aggregation via atexog.
What’s supported¶
Out of the box, with adapter auto-detection:
statsmodels: GLM (all families × links), OLS / WLS / GLS, discrete (Logit, Probit, Poisson, NegativeBinomial), MNLogit, ordered, GEE (Gaussian / nominal / ordinal), MixedLM, PHReg, RLM, QuantReg, zero-inflated.
linearmodels: IV/2SLS/LIML/GMM, panel (FE/RE/BE), AbsorbingLS, FamaMacBeth.
lifelines: CoxPHFitter, CoxTimeVaryingFitter, AAF, CRC splines, Weibull / LogLogistic / LogNormal / GeneralizedGamma / PiecewiseExponential AFTs.
scikit-learn: Bootstrap-only inference for any estimator with
fit()andpredict(). Linear models should use statsmodels or linearmodels instead; see scikit-learn models.
Models fit without native formula support (array-fit statsmodels,
linearmodels, sklearn) can still use interactions, polynomials, and
splines by passing formula= and data= to Margins; see
Formula interface for array-fit models.
Adapters for other model classes can be registered via
register_adapter(); see
Writing a custom adapter.
Where to next¶
Tutorials — learn by doing.
Getting started — first session, AAP and AME.
GLM — Binomial logit, GLM — Poisson count, OLS — linear regression, Multinomial logit — GLM family walkthroughs.
Cox proportional hazards, Accelerated failure time models — survival models.
Instrumental variables (2SLS / LIML / GMM), Panel data — fixed effects — instrumental variables and panel data.
Contrasts and difference-in-differences — pairwise contrasts and DiD.
Inference — delta, simulation, bootstrap — delta vs simulation vs bootstrap.
Inference scales and the κ diagnostic — picking an inference scale and reading the κ diagnostic.
scikit-learn models — bootstrap inference for scikit-learn estimators.
How-to guides — task-focused recipes.
Robust / cluster / block SEs: Robust and clustered standard errors, Bootstrap inference, Cluster and block bootstrap.
Simultaneous CIs: Simultaneous confidence intervals and multiple-testing correction.
Scenario builders: Scenario helpers, Grid predictions, Difference-in-differences on the response scale.
Discrete / elasticities: Elasticities and semi-elasticities, Discrete changes for binary / categorical regressors.
κ fallback: Reading and controlling the κ fallback.
Extension and export: Writing a custom adapter, Matching support, Exporting and persisting results, Plotting predictions and effects.
Formula support for array-fit models: Formula interface for array-fit models.
Explanations — theory and design.
Mathematical motivation — delta method, statistic schema, scale chain rule.
The session pre-commitment — why a session.
Inference scale (phi / phi_inv) —
phi/phi_inv.The κ curvature diagnostic — Skovgaard’s κ and the auto-fallback.
Gradient backend: autodiff vs wrapped-FD vs FD — autodiff vs wrapped-FD vs FD.
Delta vs simulation vs bootstrap — when each VCE is right.
The scenarios model — at vs atexog —
atvsatexog.The adapter pattern — how adapters fit a model.
How pymargins differs from marginaleffects and margins — where we differ.
Reference.
API reference — every public class and method.
Replication scripts (archive) — the Williams (2012) walkthrough scripts.