pymargins.reimpute

pymargins.reimpute(imputer, *, incomplete: DataFrame, warn_on_deterministic: bool = True)

Create a reimpute pipeline stage.

On every bootstrap replicate the imputer is called fresh on the resampled incomplete data. This injects imputation-model parameter uncertainty into the bootstrap distribution, making even nominally “improper” imputers proper enough for valid inference.

Parameters:
  • imputer (callable) – imputer(frame) -> frame. Must accept a DataFrame and return a DataFrame of the same shape with missing values filled. The callable is expected to fit-and-impute (re-derive), not apply a frozen fitted model.

  • incomplete (pd.DataFrame) – The incomplete data (with missingness). The bootstrap resamples this frame, not the adapter’s training data, so that every replicate has missing cells to impute.

  • warn_on_deterministic (bool, default True) – Whether to run the cheap determinism guard at construction. Disable if you know your imputer is deterministic and you want to avoid the double-run overhead.

Returns:

A stage with requires_resampling=True (bootstrap-only) and source_data=incomplete.

Return type:

Stage

Warns:
  • UserWarning – If calling imputer twice on the same frame yields byte-identical output, the imputer is deterministic-given-data (no residual draw). MI variance will be too narrow; consider a stochastic imputer such as IterativeImputer(sample_posterior=True).

  • UserWarning – If the imputer exposes a random_state attribute that is None, reproducibility is not guaranteed even with a fixed session rng_seed. Set random_state to an integer on the imputer for deterministic draws.