Cluster and block bootstrap¶
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from pymargins import Margins
rng = np.random.default_rng(42)
n = 2000
df = pd.DataFrame({
"x": rng.normal(0, 1, n),
"firm": rng.integers(1, 50, n),
})
df["y"] = rng.binomial(1, 1 / (1 + np.exp(-(-1 + 0.5 * df["x"]))))
fit = smf.glm("y ~ x", data=df, family=sm.families.Binomial()).fit()
For panel / multilevel data, pass cluster= at session construction:
m = Margins.log_scale(
fit,
method="bootstrap",
n_boot=2000,
cluster=df["firm"].values,
)
Whole clusters are resampled with replacement; within-cluster dependence is preserved.
For time-series data, pass block_size= to use moving-block
resampling:
m = Margins.linear_scale(
fit,
method="bootstrap",
n_boot=2000,
block_size=8,
)
cluster= and block_size= are mutually exclusive. The block length
should span the dependence horizon: too short under-covers, too long
collapses to fewer effective draws.