--- jupytext: text_representation: extension: .md format_name: myst format_version: 0.13 kernelspec: display_name: Python 3 language: python name: python3 --- # Linear Mixed Effects `statsmodels.MixedLM` fits random-intercept and random-slope models. `pymargins` uses the fixed-effects coefficients and their covariance for population-averaged marginal effects and predictions. Random effects are integrated out — the reported quantities are unconditional on any particular cluster. ```{code-cell} python import numpy as np import pandas as pd import statsmodels.formula.api as smf from pymargins import Margins rng = np.random.default_rng(42) n = 400 df = pd.DataFrame({ "age": rng.integers(20, 75, n), "female": rng.binomial(1, 0.52, n), "treated": rng.binomial(1, 0.40, n), "school": rng.choice(["A", "B", "C", "D"], n), "student": np.repeat(np.arange(40), 10), }) # Random intercept per student student_effect = rng.standard_normal(40)[df["student"].values] df["score"] = ( 50 + 0.3 * df["age"] - 2.0 * df["female"] + 3.0 * df["treated"] + student_effect + rng.normal(0, 5, n) ) fit = smf.mixedlm( "score ~ age + female + treated + C(school)", groups="student", data=df, ).fit() ``` ## AME of `age` ```{code-cell} python m = Margins.linear_scale(fit, at="overall") print(m.dydx("age").summary()) ``` ## Predicted score by treatment arm ```{code-cell} python print(m.predict(atexog={"treated": [0, 1]}).summary()) ``` ## Treatment contrast ```{code-cell} python from pymargins import pairwise scen, w = pairwise("treated", [1, 0]) print(m.contrasts(scenarios=scen, contrasts=w).summary()) ``` ## Plot: predicted score over age by sex ```{code-cell} python import matplotlib.pyplot as plt ages = list(range(20, 76, 2)) res = m.predict(atexog={"age": ages, "female": [0, 1]}) df_plot = res.to_frame() fig, ax = plt.subplots(figsize=(6, 4)) for level, sub in df_plot.groupby("female"): label = "Female" if level == 1 else "Male" ax.plot(sub["age"], sub["estimate"], label=label) ax.fill_between( sub["age"], sub["ci_lower"], sub["ci_upper"], alpha=0.15 ) ax.set(xlabel="Age", ylabel="Predicted score") ax.legend(title="Sex") ``` ## Plot: forest plot of school contrasts ```{code-cell} python from pymargins import reference scen, W = reference("school", ["A", "B", "C", "D"], ref_level="A") res = m.contrasts(scenarios=scen, contrasts=W) df_forest = res.to_frame() fig, ax = plt.subplots(figsize=(4, 3)) y = range(len(df_forest)) ax.errorbar( df_forest["estimate"], y, xerr=[df_forest["estimate"] - df_forest["ci_lower"], df_forest["ci_upper"] - df_forest["estimate"]], fmt="o", capsize=3, ) ax.axvline(0, color="grey", lw=0.5) ax.set_yticks(list(y)) ax.set_yticklabels(df_forest["label"]) ax.set_xlabel("Score difference vs School A") ax.invert_yaxis() ``` ## Array interface If you fit with `sm.MixedLM(y, X, groups=...)` instead of `smf.mixedlm(...)`, pass the training data explicitly: ```python from pymargins.adapters import StatsmodelsMixedLMAdapter adapter = StatsmodelsMixedLMAdapter(fit_array, training_data=df) m = Margins.linear_scale(fit_array, adapter=adapter, at="overall") ``` ## Notes on bootstrap inference Mixed-model covariance is already cluster-aware through the random effects structure. If you still want a bootstrap, open the session with `method="bootstrap"` and a modest `n_boot=` — refitting MixedLM is slower than OLS or GLM, so keep the replicate count low for exploration and raise it only for final reporting.