pymargins.reimpute¶
- pymargins.reimpute(imputer, *, incomplete: DataFrame, warn_on_deterministic: bool = True)¶
Create a
reimputepipeline stage.On every bootstrap replicate the imputer is called fresh on the resampled incomplete data. This injects imputation-model parameter uncertainty into the bootstrap distribution, making even nominally “improper” imputers proper enough for valid inference.
- Parameters:
imputer (callable) –
imputer(frame) -> frame. Must accept a DataFrame and return a DataFrame of the same shape with missing values filled. The callable is expected to fit-and-impute (re-derive), not apply a frozen fitted model.incomplete (pd.DataFrame) – The incomplete data (with missingness). The bootstrap resamples this frame, not the adapter’s training data, so that every replicate has missing cells to impute.
warn_on_deterministic (bool, default True) – Whether to run the cheap determinism guard at construction. Disable if you know your imputer is deterministic and you want to avoid the double-run overhead.
- Returns:
A stage with
requires_resampling=True(bootstrap-only) andsource_data=incomplete.- Return type:
Stage
- Warns:
UserWarning – If calling imputer twice on the same frame yields byte-identical output, the imputer is deterministic-given-data (no residual draw). MI variance will be too narrow; consider a stochastic imputer such as
IterativeImputer(sample_posterior=True).UserWarning – If the imputer exposes a
random_stateattribute that isNone, reproducibility is not guaranteed even with a fixed sessionrng_seed. Setrandom_stateto an integer on the imputer for deterministic draws.