# Pooling precomputed imputations (``pool_imputations``)

``pool_imputations`` combines **M finished analyses** — one per completed
dataset — into a single result using **Rubin's rules**.  It is the
artifacts-side counterpart to [``reimpute``](mi_via_reimpute.md): where
``reimpute`` cooks imputation into a bootstrap loop and needs a re-runnable
imputer, ``pool_imputations`` takes the M results you already have and performs
the analytical fan-in.

The package contributes the *combination*, not the imputation.  You generate
the M completed datasets however you like — ``sklearn.IterativeImputer``,
R ``mice`` exported to CSV, your own sampler — fit your model M times, compute
the **same estimand** on each, then hand the M ``MarginsResult`` objects to
``pool_imputations``.

## When to use this

- You hold **M completed datasets** (imputer *outputs*), or M finished
  ``MarginsResult`` objects — not a re-runnable imputer callable.  If you have
  a callable and want bootstrap inference, use [``reimpute``](mi_via_reimpute.md)
  instead.
- You want a **pooled point estimate _and_ interval** — Rubin's ``Q̄`` with a
  Student-*t* CI on the Rubin degrees of freedom.  (``reimpute`` keeps the
  single-imputation point estimate and only widens the CI.)
- You want the **MI diagnostics**: fraction of missing information (FMI),
  relative efficiency, and the Barnard–Rubin degrees of freedom.

## Quick example

Five posterior-draw imputations of a frame with 30% missingness in ``x1``, an
OLS fit on each, the average marginal effect of ``x1`` from each, pooled:

```python
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf
from sklearn.experimental import enable_iterative_imputer  # noqa: F401
from sklearn.impute import IterativeImputer

from pymargins import Margins, pool_imputations

# 1. Data with MAR missingness in x1
rng = np.random.default_rng(0)
n = 400
df = pd.DataFrame({"x1": rng.normal(size=n), "x2": rng.normal(size=n)})
df["y"] = 1.0 + 0.6 * df["x1"] - 0.4 * df["x2"] + rng.normal(scale=0.5, size=n)
df_nan = df.copy()
df_nan.loc[rng.uniform(size=n) < 0.30, "x1"] = np.nan

# 2. M completed datasets -> M fits -> the SAME estimand on each
per_imputation = []
for seed in range(5):
    imp = IterativeImputer(sample_posterior=True, random_state=seed, max_iter=10)
    completed = pd.DataFrame(imp.fit_transform(df_nan), columns=df_nan.columns)
    fit = smf.ols("y ~ x1 + x2", data=completed).fit()
    per_imputation.append(Margins.linear_scale(fit).dydx("x1"))

# 3. Pool via Rubin's rules
pooled = pool_imputations(per_imputation)
print(pooled.summary())
```

```text
=======================================================
          Margins Result (pooled, level=0.95)
=======================================================
    estimate  std err        t  P>|t|  [95% Conf. Int.]
-------------------------------------------------------
x1    0.6120   0.0257  23.8155  0.000    0.5616, 0.6624
=======================================================

n = 400
MI pooled (Rubin): M=5, FMI 0.058, df 1249.9, rel. eff. 0.989
```

The footer line is the MI audit trail: ``M`` imputations, the (max) FMI, the
(min) Rubin df, and the (min) relative efficiency.

## What pooling computes

Per estimand component, with within-imputation variances ``U_m = se_m²`` and
point estimates ``Q_m``, all on the **inference scale**:

```
Q̄  = mean(Q_m)                  # pooled point estimate
W   = mean(U_m)                  # within-imputation variance
B   = var(Q_m, ddof=1)           # between-imputation variance
T   = W + (1 + 1/M) · B          # total variance
SE  = sqrt(T)
```

``B`` is the only extra term multiple imputation introduces — the spread of the
M point estimates over the missing-data distribution.  It inflates the pooled
SE above the average within-imputation SE:

```python
within = np.mean([float(r.std_error) for r in per_imputation])
print(f"mean within-imputation SE : {within:.4f}")   # 0.0249
print(f"pooled SE                 : {float(pooled.std_error):.4f}")  # 0.0257
```

The confidence interval uses Student-*t* on the **Rubin degrees of freedom**
(not the package's usual *z*), so small *M* with real between-imputation
spread widens the interval correctly.  The diagnostics live on
``imputation_diagnostic``:

```python
diag = pooled.imputation_diagnostic
print(f"FMI                 : {float(diag.fmi):.3f}")   # 0.058
print(f"relative efficiency : {float(diag.relative_efficiency):.3f}")  # 0.989
print(f"Rubin df            : {float(diag.df):.1f}")    # 1249.9
```

For a vector estimand every field is per-component (``diag.fmi.shape ==
pooled.estimate.shape``); for a scalar estimand they are plain floats.

## Key rules

### 1. Same estimand, shared posture

All M results must describe the **same estimand**: identical labels, ``kind``,
``at``-grid, ``scenarios``, confidence ``level``, and inference scale.  Build
the M sessions identically and call the same estimand method on each.
``pool_imputations`` validates this up front and raises on the first mismatch:

```python
# Pooling a slope against a prediction -> ValueError
pool_imputations([per_imputation[0], some_prediction_result])
# pool_imputations: result 1 has labels=[...], expected [...].
# All results must be the same estimand computed under a shared posture ...
```

This is a cross-imputation invariant: the *posture* (scale, scenario grid,
``at``) must be fixed across the M fits.  Only the *data* — the completed
frame — changes from one imputation to the next.

### 2. Pooling happens on the inference scale

Rubin's rules assume approximate normality across imputations, so pooling is
done on the **inference scale** (log, logit, …) and mapped back through ``φ``
at the end — consistent with how the package handles non-identity scales
everywhere.  Use the matching ``Margins.*_scale`` constructor on each fit and
``pool_imputations`` does the rest:

```python
# Logit fits -> pool on the logit scale -> report on the probability scale
per_imp_logit = []
for seed in range(5):
    imp = IterativeImputer(sample_posterior=True, random_state=seed, max_iter=10)
    completed = df_bin.copy()
    completed[["x1", "x2"]] = imp.fit_transform(df_bin[["x1", "x2"]])
    fit = smf.logit("bin ~ x1 + x2", data=completed).fit(disp=False)
    per_imp_logit.append(
        Margins.logit_scale(fit).predict(atexog={"x1": 0.0, "x2": 0.0})
    )

pooled_logit = pool_imputations(per_imp_logit)
print(pooled_logit.summary())
```

```text
===================================================================
                Margins Result (pooled, level=0.95)
===================================================================
                estimate  std err        t  P>|t|  [95% Conf. Int.]
-------------------------------------------------------------------
x1=0.0, x2=0.0    0.4977   0.1204  -0.0755  0.940    0.4390, 0.5566
===================================================================

n = 400
Note: std err is on the inference scale; estimate and CI are on the reporting scale.
MI pooled (Rubin): M=5, FMI 0.072, df 813.3, rel. eff. 0.986
```

The pooled point estimate is the *geometric*-style mean implied by the scale
(arithmetic on the logit, mapped back to a probability), not a naive average of
the reported probabilities.

### 3. Inference-method-agnostic

Pooling reads only each branch's ``(estimate, std_error)``.  Each of the M
analyses may have used delta, simulation, **or** bootstrap independently —
``W_m = se_m²`` is just whatever variance that branch produced — and they pool
together fine.  This is why ``pool_imputations`` does **not** reuse the
same-session composition machinery (``+`` / ``compose_results``): the M results
come from M different sessions with M different ``Σ̂``.

### 4. Small-sample degrees of freedom (``complete_df``)

By default the Rubin (1987) degrees of freedom are used.  If you know the
complete-data residual df ``ν_com``, pass it for the Barnard–Rubin (1999)
small-sample correction, which shrinks the df toward ``ν_com``:

```python
pooled = pool_imputations(per_imputation, complete_df=n - 3)
```

### 5. A pooled result is a leaf

The M results came from M different sessions, so a pooled result carries no
``session``, ``gradient``, or ``draws`` — it is a **terminal reduction**, like
``materialize()``.  It still re-levels and tests itself from the stored Rubin
df:

```python
lo, hi = pooled.conf_int(level=0.90)        # recomputes the t-interval
tr = pooled.test(value=0.0, null_scale="inference")  # pooled t-statistic
```

Because it is a leaf, it cannot be composed further (``pooled - other`` raises
the ordinary same-session error).  Pool as the **last** step.

## ``pool_imputations`` vs ``reimpute``

Both handle missing data; they sit at different levels and answer to different
inputs.

| | ``pool_imputations`` | [``reimpute``](mi_via_reimpute.md) |
|---|---|---|
| You bring | M completed frames / M results | a re-runnable imputer callable |
| Mechanism | Rubin fan-in (analysis level) | impute-per-replicate (bootstrap loop) |
| Inference | any (delta / sim / bootstrap, per branch) | bootstrap only |
| Point estimate | pooled ``Q̄`` | single-imputation |
| Variance | within + between (Rubin) | one bootstrap distribution |
| Diagnostics | FMI, rel. eff., df | none (bootstrap byproduct) |
| Congeniality | your responsibility | largely sidestepped |

Rule of thumb: **precomputed frames → ``pool_imputations``; a live imputer you
are willing to bootstrap → ``reimpute``.**  For delta/simulation inference a
re-runnable imputer buys nothing that M precomputed frames do not, so pool.

## Limitations

- **At least two results.** ``B`` is undefined for ``M < 2``.
- **Each branch needs a positive SE.** A degenerate analysis (``se_m = 0`` for
  every branch) cannot be pooled and raises.
- **Congeniality is on you.** Rubin's validity is a property of the *pairing*
  of imputation model and analysis model; nothing here checks it.
- **No simultaneous CIs yet.** ``conf_int(simultaneous=True)`` on a pooled
  result raises ``NotImplementedError`` — a leaf has no draws to build a
  sup-*t* band from.
- **R ``mice`` frames work too.** Export each completed dataset to CSV, fit and
  compute your estimand in Python, and pool — you do not need a Python imputer,
  only the M completed frames.