Cluster and block bootstrap¶

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from pymargins import Margins

rng = np.random.default_rng(42)
n = 2000
df = pd.DataFrame({
    "x": rng.normal(0, 1, n),
    "firm": rng.integers(1, 50, n),
})
df["y"] = rng.binomial(1, 1 / (1 + np.exp(-(-1 + 0.5 * df["x"]))))

fit = smf.glm("y ~ x", data=df, family=sm.families.Binomial()).fit()

For panel / multilevel data, pass cluster= at session construction:

m = Margins.log_scale(
    fit,
    method="bootstrap",
    n_boot=2000,
    cluster=df["firm"].values,
)

Whole clusters are resampled with replacement; within-cluster dependence is preserved.

For time-series data, pass block_size= to use moving-block resampling:

m = Margins.linear_scale(
    fit,
    method="bootstrap",
    n_boot=2000,
    block_size=8,
)

cluster= and block_size= are mutually exclusive. The block length should span the dependence horizon: too short under-covers, too long collapses to fewer effective draws.

Cluster and block bootstrap¶

pymargins

Navigation

Related Topics