Introduction

pymargins computes marginal-effects-style quantities — adjusted predictions, slopes, contrasts, difference-in-differences, custom linear and nonlinear combinations of predictions — from fitted statistical models. Uncertainty is quantified via the delta method, parametric simulation (Krinsky–Robb), or nonparametric bootstrap.

The design target is Stata’s margins command and the R package marginaleffects, with one substantive difference: Margins is a session — once constructed it commits to an inference scale, variance estimator, confidence level, default evaluation point, and inference method. Every subsequent computation inherits those commitments. Switching any of them requires a new session.

Why marginal effects?

A fitted nonlinear model reports on the wrong scale. A logistic regression returns coefficients in log-odds; a Poisson model, in log-counts; a Cox model, in log-hazards. The question an analyst or stakeholder actually has — how much does the predicted probability, count, or survival move? — is a marginal effect, and in a nonlinear model that quantity is not any single coefficient. It is a derivative or discrete contrast that

  • depends on where in covariate space it is evaluated (the effect of bmi in a logit differs along the curve),

  • combines several coefficients whenever the model contains interactions, polynomials, or splines, and

  • carries its own standard error, which hand-calculation usually gets wrong.

pymargins computes that quantity, on the outcome scale, with uncertainty from the delta method, simulation, or bootstrap:

m = Margins.log_scale(fit, vcov="HC3")

m.dydx("bmi")                            # AME on P(diabetes)
m.predict(atexog={"age": [25, 45, 65]})  # adjusted predictions
m.contrasts(scenarios=[...], contrasts=[+1, -1])  # risk difference

Reach for marginal effects when the model is nonlinear, when effects are conditional (interactions / polynomials / splines), when the audience needs outcome units rather than link-scale coefficients, when heterogeneity across subgroups is the research question, or when the answer is a counterfactual contrast (“what if everyone were treated?”). These correspond exactly to the estimand axis below: predict(), dydx(), and contrasts() / evaluate().

Why a session?

A session forces the analyst to declare their analytical posture explicitly, in code, at one location. A reviewer sees the entire methodological commitment in the constructor call. Any change in posture — different scale, different vcov, different level — shows up as a new Margins(...) in the audit trail.

The alternative (per-call configuration) is more flexible but invites the analyst to try multiple scales, multiple vcov flavors, multiple levels until something looks significant. pymargins makes this possible but visible: do it, and you’ll have multiple sessions in your code, each documenting its own posture.

Installation

pip install pymargins

Requires Python ≥3.10 and JAX. Dependencies (numpy, pandas, scipy, jax) are installed automatically; model-specific backends (statsmodels, lifelines, linearmodels) are detected at runtime when present.

Quickstart

import statsmodels.api as sm
import statsmodels.formula.api as smf
from pymargins import Margins

fit = smf.glm(
    "diabetes ~ C(black) + C(female) + C(agegrp) + bmi + age",
    data=df,
    family=sm.families.Binomial(),
).fit()

# Open a session — log scale, HC3 robust SEs, 95% CIs, delta method.
m = Margins.log_scale(fit, vcov="HC3", level=0.95)
print(m.summary())          # methods-section paragraph

# Pre-flight: is delta reliable here?
print(m.diagnose().summary())

# Adjusted predictions at representative values.
m.predict(atexog={"age": [25, 45, 65]})

# Average marginal effects on the response scale.
m.dydx("age")

# A relative-risk contrast, two scenarios with a [+1, -1] weight.
rr = m.contrasts(
    scenarios=[
        {"atexog": {"treatment": 1}, "label": "treated"},
        {"atexog": {"treatment": 0}, "label": "control"},
    ],
    contrasts=[+1, -1],
)
print(rr.summary())

Each call returns a MarginsResult with .estimate, .se, .vcov, .ci_lower, .ci_upper, .pvalue, plus .summary(), .to_frame(), .to_latex(), and .to_html().

The three orthogonal axes

Every call is a point in a three-dimensional design space:

  1. Estimand — what is computed? predict(), dydx(), contrasts(), or evaluate() (arbitrary differentiable composition).

  2. Aggregation — where in covariate space? at="overall" (per-row, then average → AME / AAP), at="typical" / at="mean" (single representative point → MEM / APM), or atexog={...} (representative grid → MER / APR).

  3. Inference — how is uncertainty quantified? method="delta", method="simulation", or method="bootstrap".

These are orthogonal. Any combination is meaningful. The session fixes the scale, vcov, level, and default at / method; per-call keywords pick the estimand and override aggregation via atexog.

What’s supported

Out of the box, with adapter auto-detection:

  • statsmodels: GLM (all families × links), OLS / WLS / GLS, discrete (Logit, Probit, Poisson, NegativeBinomial), MNLogit, ordered, GEE (Gaussian / nominal / ordinal), MixedLM, PHReg, RLM, QuantReg, zero-inflated.

  • linearmodels: IV/2SLS/LIML/GMM, panel (FE/RE/BE), AbsorbingLS, FamaMacBeth.

  • lifelines: CoxPHFitter, CoxTimeVaryingFitter, AAF, CRC splines, Weibull / LogLogistic / LogNormal / GeneralizedGamma / PiecewiseExponential AFTs.

  • scikit-learn: Bootstrap-only inference for any estimator with fit() and predict(). Linear models should use statsmodels or linearmodels instead; see scikit-learn models.

Models fit without native formula support (array-fit statsmodels, linearmodels, sklearn) can still use interactions, polynomials, and splines by passing formula= and data= to Margins; see Formula interface for array-fit models.

Adapters for other model classes can be registered via register_adapter(); see Writing a custom adapter.

Where to next

Tutorials — learn by doing.

How-to guides — task-focused recipes.

Explanations — theory and design.

Reference.