Introduction¶

pymargins computes marginal-effects-style quantities — adjusted predictions, slopes, contrasts, difference-in-differences, custom linear and nonlinear combinations of predictions — from fitted statistical models. Uncertainty is quantified via the delta method, parametric simulation (Krinsky–Robb), or nonparametric bootstrap.

The design target is Stata’s margins command and the R package marginaleffects, with one substantive difference: Margins is a session — once constructed it commits to an inference scale, variance estimator, confidence level, default evaluation point, and inference method. Every subsequent computation inherits those commitments. Switching any of them requires a new session.

Why marginal effects?¶

A fitted nonlinear model reports on the wrong scale. A logistic regression returns coefficients in log-odds; a Poisson model, in log-counts; a Cox model, in log-hazards. The question an analyst or stakeholder actually has — how much does the predicted probability, count, or survival move? — is a marginal effect, and in a nonlinear model that quantity is not any single coefficient. It is a derivative or discrete contrast that

depends on where in covariate space it is evaluated (the effect of bmi in a logit differs along the curve),
combines several coefficients whenever the model contains interactions, polynomials, or splines, and
carries its own standard error, which hand-calculation usually gets wrong.

pymargins computes that quantity, on the outcome scale, with uncertainty from the delta method, simulation, or bootstrap:

m = Margins.log_scale(fit, vcov="HC3")

m.dydx("bmi")                            # AME on P(diabetes)
m.predict(atexog={"age": [25, 45, 65]})  # adjusted predictions
m.contrasts(scenarios=[...], contrasts=[+1, -1])  # risk difference

Reach for marginal effects when the model is nonlinear, when effects are conditional (interactions / polynomials / splines), when the audience needs outcome units rather than link-scale coefficients, when heterogeneity across subgroups is the research question, or when the answer is a counterfactual contrast (“what if everyone were treated?”). These correspond exactly to the estimand axis below: predict(), dydx(), and contrasts() / evaluate().

Why a session?¶

A session forces the analyst to declare their analytical posture explicitly, in code, at one location. A reviewer sees the entire methodological commitment in the constructor call. Any change in posture — different scale, different vcov, different level — shows up as a new Margins(...) in the audit trail.

The alternative (per-call configuration) is more flexible but invites the analyst to try multiple scales, multiple vcov flavors, multiple levels until something looks significant. pymargins makes this possible but visible: do it, and you’ll have multiple sessions in your code, each documenting its own posture.

Installation¶

pip install pymargins

Requires Python ≥3.10 and JAX. Dependencies (numpy, pandas, scipy, jax) are installed automatically; model-specific backends (statsmodels, lifelines, linearmodels) are detected at runtime when present.

Quickstart¶

import statsmodels.api as sm
import statsmodels.formula.api as smf
from pymargins import Margins

fit = smf.glm(
    "diabetes ~ C(black) + C(female) + C(agegrp) + bmi + age",
    data=df,
    family=sm.families.Binomial(),
).fit()

# Open a session — log scale, HC3 robust SEs, 95% CIs, delta method.
m = Margins.log_scale(fit, vcov="HC3", level=0.95)
print(m.summary())          # methods-section paragraph

# Pre-flight: is delta reliable here?
print(m.diagnose().summary())

# Adjusted predictions at representative values.
m.predict(atexog={"age": [25, 45, 65]})

# Average marginal effects on the response scale.
m.dydx("age")

# A relative-risk contrast, two scenarios with a [+1, -1] weight.
rr = m.contrasts(
    scenarios=[
        {"atexog": {"treatment": 1}, "label": "treated"},
        {"atexog": {"treatment": 0}, "label": "control"},
    ],
    contrasts=[+1, -1],
)
print(rr.summary())

Each call returns a MarginsResult with .estimate, .se, .vcov, .ci_lower, .ci_upper, .pvalue, plus .summary(), .to_frame(), .to_latex(), and .to_html().

The three orthogonal axes¶

Every call is a point in a three-dimensional design space:

Estimand — what is computed? predict(), dydx(), contrasts(), or evaluate() (arbitrary differentiable composition).
Aggregation — where in covariate space? at="overall" (per-row, then average → AME / AAP), at="typical" / at="mean" (single representative point → MEM / APM), or atexog={...} (representative grid → MER / APR).
Inference — how is uncertainty quantified? method="delta", method="simulation", or method="bootstrap".

These are orthogonal. Any combination is meaningful. The session fixes the scale, vcov, level, and default at / method; per-call keywords pick the estimand and override aggregation via atexog.

What’s supported¶

Out of the box, with adapter auto-detection:

statsmodels: GLM (all families × links), OLS / WLS / GLS, discrete (Logit, Probit, Poisson, NegativeBinomial), MNLogit, ordered, GEE (Gaussian / nominal / ordinal), MixedLM, PHReg, RLM, QuantReg, zero-inflated.
linearmodels: IV/2SLS/LIML/GMM, panel (FE/RE/BE), AbsorbingLS, FamaMacBeth.
lifelines: CoxPHFitter, CoxTimeVaryingFitter, AAF, CRC splines, Weibull / LogLogistic / LogNormal / GeneralizedGamma / PiecewiseExponential AFTs.
scikit-learn: Bootstrap-only inference for any estimator with fit() and predict(). Linear models should use statsmodels or linearmodels instead; see scikit-learn models.

Models fit without native formula support (array-fit statsmodels, linearmodels, sklearn) can still use interactions, polynomials, and splines by passing formula= and data= to Margins; see Formula interface for array-fit models.

Adapters for other model classes can be registered via register_adapter(); see Writing a custom adapter.

Where to next¶

Tutorials — learn by doing.

Getting started — first session, AAP and AME.
GLM — Binomial logit, GLM — Poisson count, OLS — linear regression, Multinomial logit — GLM family walkthroughs.
Cox proportional hazards, Accelerated failure time models — survival models.
Instrumental variables (2SLS / LIML / GMM), Panel data — fixed effects — instrumental variables and panel data.
Contrasts and difference-in-differences — pairwise contrasts and DiD.
Inference — delta, simulation, bootstrap — delta vs simulation vs bootstrap.
Inference scales and the κ diagnostic — picking an inference scale and reading the κ diagnostic.
scikit-learn models — bootstrap inference for scikit-learn estimators.

How-to guides — task-focused recipes.

Robust / cluster / block SEs: Robust and clustered standard errors, Bootstrap inference, Cluster and block bootstrap.
Simultaneous CIs: Simultaneous confidence intervals and multiple-testing correction.
Scenario builders: Scenario helpers, Grid predictions, Difference-in-differences on the response scale.
Discrete / elasticities: Elasticities and semi-elasticities, Discrete changes for binary / categorical regressors.
κ fallback: Reading and controlling the κ fallback.
Extension and export: Writing a custom adapter, Matching support, Exporting and persisting results, Plotting predictions and effects.
Formula support for array-fit models: Formula interface for array-fit models.

Explanations — theory and design.

Mathematical motivation — delta method, statistic schema, scale chain rule.
The session pre-commitment — why a session.
Inference scale (phi / phi_inv) — phi / phi_inv.
The κ curvature diagnostic — Skovgaard’s κ and the auto-fallback.
Gradient backend: autodiff vs wrapped-FD vs FD — autodiff vs wrapped-FD vs FD.
Delta vs simulation vs bootstrap — when each VCE is right.
The scenarios model — at vs atexog — at vs atexog.
The adapter pattern — how adapters fit a model.
How pymargins differs from marginaleffects and margins — where we differ.

Reference.

API reference — every public class and method.
Replication scripts (archive) — the Williams (2012) walkthrough scripts.