--- jupytext: text_representation: extension: .md format_name: myst format_version: 0.13 kernelspec: display_name: Python 3 language: python name: python3 --- # Bootstrap inference ```{code-cell} python import numpy as np import pandas as pd import statsmodels.api as sm import statsmodels.formula.api as smf from pymargins import Margins rng = np.random.default_rng(42) n = 2000 df = pd.DataFrame({ "age": rng.integers(20, 75, n), "female": rng.binomial(1, 0.52, n), "treated": rng.binomial(1, 0.40, n), }) lp = -1.5 + 0.04 * df["age"] - 0.3 * df["female"] + 0.8 * df["treated"] df["y"] = rng.binomial(1, 1 / (1 + np.exp(-lp))) fit = smf.glm("y ~ age + female + treated", data=df, family=sm.families.Binomial()).fit() m = Margins.log_scale(fit, at="overall") ``` Switch the session's inference method to `"bootstrap"` and pick the number of replicates. The default scheme is *pairs* — rows resampled IID with replacement. ```{code-cell} python m = Margins.log_scale(fit, method="bootstrap", n_boot=2000, vcov="HC3") print(m.dydx("age").summary()) ``` Parallelism uses thread pools; BLAS threads are pinned to 1 per worker to avoid oversubscription: ```{code-cell} python m = Margins.log_scale(fit, method="bootstrap", n_boot=2000, n_jobs=-1) ``` ## Point estimates under bootstrap The point estimate stays the analytic `g(β̂)` — `pymargins` does not report the bootstrap mean as the estimate, matching Stata's convention. The bootstrap is used *only* for the standard error and the empirical quantiles of the CI. This keeps the estimator consistent even when the bootstrap distribution is biased (e.g. in small samples). ## Failed refits Failed refits are caught and counted; a `RuntimeWarning` fires when the failure rate exceeds 5%. If you see this warning, inspect the model specification — non-convergence on 5% of bootstrap samples usually indicates separation, perfect multicollinearity, or a misspecified link function. For correlated data use [](cluster_block_bootstrap.md).