--- jupytext: text_representation: extension: .md format_name: myst format_version: 0.13 kernelspec: display_name: Python 3 language: python name: python3 --- # Per-observation influence `MarginsResult.influence()` returns, for each training observation, how much that observation contributes to the estimate — the leave-one-out *deletion influence* `θ̂ − θ_{(-i)}`. Unlike statsmodels' built-in influence diagnostics (which describe the coefficients `β`), this works for the **estimand you actually report**: an average marginal effect, a contrast, an elasticity, an RMST. A large positive value means dropping that observation would pull the estimate down. ```{code-cell} python import numpy as np import pandas as pd import statsmodels.api as sm import statsmodels.formula.api as smf from pymargins import Margins rng = np.random.default_rng(1) n = 180 df = pd.DataFrame({ "x": rng.normal(0, 1, n), "z": rng.normal(0, 1, n), }) lp = -0.5 + 0.8 * df["x"] - 0.4 * df["z"] df["y"] = rng.binomial(1, 1 / (1 + np.exp(-lp))) fit = smf.glm("y ~ x + z", data=df, family=sm.families.Binomial()).fit() m = Margins.linear_scale(fit, at="overall") ame = m.dydx("x") print(ame.summary()) ``` ## Basic usage ```{code-cell} python infl = ame.influence() infl.shape # (n_obs,) for a scalar estimand ``` The values are signed and (to first order) sum to zero, since the score contributions cancel at the fitted estimate: ```{code-cell} python print("sum of influence ≈", float(infl.sum())) ``` ## Finding the most influential observations Sort by absolute influence to see which rows move the marginal effect most — a Cook's-distance analogue, but for the AME rather than the coefficients. ```{code-cell} python order = np.argsort(-np.abs(infl)) top = df.iloc[order[:5]].copy() top["influence"] = infl[order[:5]] print(top) ``` ## Reconstructing the variance Summing the squared influence reconstructs the **robust (HC0) sandwich variance** of the estimand — because `Σ_i score_i score_iᵀ` is exactly the sandwich "meat": ```{code-cell} python se_from_influence = np.sqrt(np.sum(infl ** 2)) m_hc0 = Margins.linear_scale(fit, at="overall", vcov="HC0") se_hc0 = float(np.atleast_1d(m_hc0.dydx("x").std_error)[0]) print(f"sqrt(Σ influence²) = {se_from_influence:.6f}") print(f"HC0 robust SE = {se_hc0:.6f}") ``` The two agree to numerical precision. (The default model-based SE differs because it assumes the information equality holds in-sample.) Summing influence *within clusters* before squaring gives the cluster-robust variance — the same construction at a coarser grouping. ## Two routes, same quantity The example above uses the **delta method**: the analytical influence function `∇h(β̂)ᵀ Σ̂ score_i(β̂)`, where `Σ̂ score_i` is the one-step (infinitesimal jackknife) approximation to `β̂ − β̂_{(-i)}`. A **BCa bootstrap** result returns the *exact* leave-one-out refits that were already computed for the acceleration constant — no extra work: ```{code-cell} python m_boot = Margins.linear_scale( fit, at="overall", method="bootstrap", n_boot=200, rng_seed=0, bootstrap_config={"ci_method": "bca"}, ) infl_jack = m_boot.dydx("x").influence() corr = np.corrcoef(infl, infl_jack)[0, 1] print(f"correlation(delta, jackknife) = {corr:.4f}") ``` The two are strongly correlated. They are not identical: the delta route is a *one-step* approximation, so it differs from the exact refit by the leverage factor `1/(1 − h_ii)` and, for a nonlinear estimand like a logit AME, by higher-order curvature. The delta route is far cheaper (no refits) and has no sample-size cap; the jackknife is exact but limited (see pitfalls). ## Vector estimands For a length-`k` vector result (e.g. a prediction over several scenarios), `influence()` returns an `(n_obs, k)` matrix — one influence column per component: ```{code-cell} python r_vec = m.predict(atexog={"x": [-1.0, 0.0, 1.0]}) infl_vec = r_vec.influence() infl_vec.shape ``` ## Plot: an influence index plot ```{code-cell} python import matplotlib.pyplot as plt fig, ax = plt.subplots(figsize=(6, 3)) ax.stem(np.arange(n), infl, markerfmt=" ", basefmt="C7-") ax.axhline(0, color="black", lw=0.8) ax.set(xlabel="observation index", ylabel="influence on AME(x)") top_i = order[0] ax.annotate(f"obs {top_i}", (top_i, infl[top_i]), textcoords="offset points", xytext=(5, 5), color="C3") ``` ## Which models support it The delta-method route needs per-observation scores via `adapter.score_obs()`. This is implemented for the statsmodels **GLM** (all families/links), **OLS**, **Logit/Probit**, and **Poisson/NegativeBinomial** adapters. Adapters without `score_obs()` raise `NotImplementedError` on the delta route — but the BCa bootstrap route works for any bootstrappable model. ## Pitfalls * **Keep the session alive.** The delta route reads scores from the live adapter, so the `Margins` session must still be in scope: `m.dydx("x").influence()` works, but `Margins.linear_scale(fit).dydx("x").influence()` may fail if the session is garbage-collected first. Materialized results (which drop the session) cannot use the delta route. * **Jackknife size cap.** The BCa leave-one-out refits are skipped when `n_obs` (or the number of clusters) exceeds 200, and for block bootstrap where leave-one-out is not well defined. In those cases use the delta route, which has no size limit. * **`vcov` changes the meaning.** Influence is formed with the session's frozen `Σ̂`. Under a robust or cluster `vcov` it becomes the corresponding sandwich influence rather than the model-based one — usually what you want, but worth stating in a write-up.