--- jupytext: text_representation: extension: .md format_name: myst format_version: 0.13 kernelspec: display_name: Python 3 language: python name: python3 --- # Cluster and block bootstrap ```{code-cell} python import numpy as np import pandas as pd import statsmodels.api as sm import statsmodels.formula.api as smf from pymargins import Margins rng = np.random.default_rng(42) n = 2000 df = pd.DataFrame({ "x": rng.normal(0, 1, n), "firm": rng.integers(1, 50, n), }) df["y"] = rng.binomial(1, 1 / (1 + np.exp(-(-1 + 0.5 * df["x"])))) fit = smf.glm("y ~ x", data=df, family=sm.families.Binomial()).fit() ``` For panel / multilevel data, pass `cluster=` at session construction: ```{code-cell} python m = Margins.log_scale( fit, method="bootstrap", n_boot=2000, cluster=df["firm"].values, ) ``` Whole clusters are resampled with replacement; within-cluster dependence is preserved. For time-series data, pass `block_size=` to use moving-block resampling: ```{code-cell} python m = Margins.linear_scale( fit, method="bootstrap", n_boot=2000, block_size=8, ) ``` `cluster=` and `block_size=` are mutually exclusive. The block length should span the dependence horizon: too short under-covers, too long collapses to fewer effective draws.