--- jupytext: text_representation: extension: .md format_name: myst format_version: 0.13 kernelspec: display_name: Python 3 language: python name: python3 --- # Getting started This tutorial fits a small logit model, opens a `Margins` session, and walks through the three orthogonal axes: estimand, aggregation, and inference. By the end you should be able to map every common Stata-style `margins` invocation to its `pymargins` equivalent. ```{code-cell} python import numpy as np import pandas as pd import statsmodels.api as sm import statsmodels.formula.api as smf from pymargins import Margins rng = np.random.default_rng(0) n = 4000 df = pd.DataFrame({ "age": rng.integers(20, 75, n), "female": rng.binomial(1, 0.52, n), "treated": rng.binomial(1, 0.40, n), }) lp = -1.5 + 0.04 * df["age"] - 0.3 * df["female"] + 0.8 * df["treated"] df["y"] = rng.binomial(1, 1 / (1 + np.exp(-lp))) ``` ## 1. Fit a model ```{code-cell} python fit = smf.glm( "y ~ age + female + treated", data=df, family=sm.families.Binomial(), ).fit() print(fit.summary()) ``` ## 2. Open a session A session commits to an inference scale, a vcov estimator, a confidence level, an aggregation default (`at=`), and an inference method. Once constructed, every call inherits these commitments. ```{code-cell} python m = Margins.log_scale(fit, vcov="HC3", level=0.95, at="overall") print(m.summary()) ``` `Margins.log_scale(...)` is shorthand for `Margins(..., phi=jnp.exp, phi_inv=jnp.log)`. We use it here because we will compute a risk-ratio contrast below; for predicted probabilities `linear_scale` (identity) or `logit_scale` are also valid choices. See [](../explanations/inference_scale.md) for the full scale menu. ## 3. Adjusted predictions Predictions on the response scale, averaged over the observed distribution of the *other* covariates (`at="overall"`, the AAP): ```{code-cell} python print(m.predict(atexog={"treated": [0, 1]}).summary()) ``` Because the session is on the log scale, the CI is asymmetric on the probability scale (multiplicative around the point estimate). For a probability that can approach 1, `logit_scale` keeps the CI inside (0, 1); `linear_scale` gives a symmetric CI on the probability scale itself. The same predictions at the typical covariate profile (the APM — `at="typical"` uses median for continuous and mode for discrete): ```{code-cell} python print(Margins.log_scale(fit, vcov="HC3", at="typical").predict( atexog={"treated": [0, 1]} ).summary()) ``` ## 4. Marginal effects Average marginal effect of `age` on the response scale: ```{code-cell} python print(m.dydx("age").summary()) ``` A subgroup AME — same call, with `female` fixed at each level: ```{code-cell} python print(m.dydx("age", atexog={"female": [0, 1]}).summary()) ``` ## 5. Contrasts A risk-ratio contrast (`treated=1` vs `treated=0`): ```{code-cell} python rr = m.contrasts( scenarios=[ {"atexog": {"treated": 1}, "label": "treated"}, {"atexog": {"treated": 0}, "label": "control"}, ], contrasts=[+1, -1], ) print(rr.summary()) ``` Because the session is on the log scale, the back-transform turns a log-RR into an RR with an asymmetric CI. See [](../explanations/inference_scale.md). ## 6. Pre-flight diagnostic ```{code-cell} python print(m.diagnose().summary()) ``` `diagnose()` computes the κ curvature diagnostic on a sample of the estimand surface. When κ is small the delta method is reliable; when κ is large `pymargins` will *auto-fall-back to simulation* on the next call. See [](../explanations/kappa_diagnostic.md). ## Plot: prediction curve over a continuous variable ```{code-cell} python import matplotlib.pyplot as plt ages = list(range(20, 76, 2)) res = m.predict(atexog={"age": ages, "treated": [0, 1]}) df_plot = res.to_frame() fig, ax = plt.subplots(figsize=(6, 4)) for level, sub in df_plot.groupby("treated"): label = "Treated" if level == 1 else "Control" ax.plot(sub["age"], sub["estimate"], label=label) ax.fill_between( sub["age"], sub["ci_lower"], sub["ci_upper"], alpha=0.15 ) ax.set(xlabel="Age", ylabel="P(y=1)") ax.legend(title="Treatment") ``` ## Where to next - [](glm_logit.md) — a deeper logit walkthrough with factor variables. - [](contrasts_and_did.md) — pairwise contrasts and 2×2 DiD. - [](inference_methods.md) — delta vs simulation vs bootstrap.