Matching support¶

pymargins integrates with propensity-score and other matching libraries through a small protocol. The matching object is attached at session construction and governs all downstream behavior: which rows enter the estimand, how vcov is derived, and how bootstrap resamples are drawn.

The reference implementation wraps pysmatch (the active successor to pymatch). Users of other libraries can write a thin wrapper that exposes the same three attributes/methods.

The `MatchingClient` protocol¶

A matching object consumed by Margins(..., matching=...) must provide:

Attribute / method	Description
`matched_data`	`pd.DataFrame` — the matched subset the model was fit on
`cluster_ids`	`np.ndarray` — matched-set label for every row in `matched_data`
`rematch(data)`	callable that re-runs matching on a resampled DataFrame and returns the new matched subset

rematch() is only required for method="bootstrap". Delta and Krinsky–Robb ignore it.

Basic workflow with `pysmatch`¶

Fit a propensity-score model and run matching outside pymargins.
Fit the outcome model on the matched sample only.
Wrap the fitted matcher in PysmatchClient.
Open a Margins session with matching=client.

import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from pymargins import Margins, PysmatchClient
from pysmatch.Matcher import Matcher

# 1. Run matching
test = df[df["treated"] == 1].copy()
control = df[df["treated"] == 0].copy()
matcher = Matcher(test, control, yvar="treated", exclude=["y"])
matcher.fit_scores(balance=True, model_type="linear")
matcher.predict_scores()
matcher.match(method="min", nmatches=1, threshold=0.001)

# 2. Fit outcome model on matched sample ONLY
matched = matcher.matched_data
fit = smf.glm(
    "y ~ x1 + x2 + treated",
    data=matched,
    family=sm.families.Binomial(),
).fit(cov_type="cluster", cov_kwds={"groups": matched["match_id"]})

# 3. Wrap the matcher
client = PysmatchClient(matcher, treatment_col="treated")

# 4. Open session — vcov auto-derives to cluster-robust
m = Margins(fit, matching=client)
rd = m.contrasts(
    scenarios=[
        {"atexog": {"treated": 1}, "label": "treated"},
        {"atexog": {"treated": 0}, "label": "control"},
    ],
    contrasts=[+1, -1],
)
print(rd.summary())

Important

The outcome model must be fit on the matched sample, not the full sample. pymargins validates that len(matched_data) == len(training_data) and raises an error if they differ.

Why `rematch`?¶

Matching algorithms are stochastic (random starts, caliper tie-breaking). rematch fixes the pipeline internally so that every call returns the same matched sample given the same input data. More importantly, for bootstrap inference rematch is called on every resampled replicate, capturing both the uncertainty from the outcome model and the uncertainty introduced by the matching process itself.

Sensitivity analysis¶

PysmatchClient.rematch() accepts a caliper_multiplier argument (when called directly). By varying the caliper you can assess how sensitive the marginal effect is to the strictness of the matching:

for mult in [0.5, 1.0, 2.0]:
    matched = client.rematch(df, caliper_multiplier=mult)
    fit = smf.glm("y ~ x1 + x2 + treated", data=matched,
                  family=sm.families.Binomial()).fit()
    m = Margins(fit, vcov="HC3", at="overall")
    res = m.contrasts(
        scenarios=[
            {"atexog": {"treated": 1}},
            {"atexog": {"treated": 0}},
        ],
        contrasts=[+1, -1],
    )
    print(f"caliper={mult}: RD = {res.estimate.item():.3f}")

Inference after matching¶

Matching changes the effective sample size and induces correlation between matched pairs. pymargins handles this automatically:

Inference	What happens
Delta / K–R (default)	The estimand averages over `matched_data` instead of the full sample. `vcov` auto-derives to cluster-robust at the matched-set level.
Bootstrap	Resamples matched sets with replacement, calls `rematch()` on each replicate, then refits the model. This is more robust to small numbers of clusters or misspecified propensity models.

Bootstrap with rematching¶

m_boot = Margins(
    fit,
    matching=client,
    method="bootstrap",
    n_boot=500,
    rng_seed=42,
)
rd_boot = m_boot.contrasts(
    scenarios=[
        {"atexog": {"treated": 1}},
        {"atexog": {"treated": 0}},
    ],
    contrasts=[+1, -1],
)
print(rd_boot.summary())

Note

Bootstrap SEs should generally be wider than delta SEs because the bootstrap captures additional uncertainty from the matching algorithm. If they are narrower, inspect the propensity model for near-perfect prediction or very small calipers.

Custom matching wrappers¶

Users of sklearn.neighbors.NearestNeighbors, custom propensity-score code, or other libraries can write a wrapper that exposes the same protocol:

class MyMatcher:
    def __init__(self, matched_data, cluster_ids, matcher_fn):
        self.matched_data = matched_data
        self.cluster_ids = cluster_ids
        self._matcher_fn = matcher_fn

    def rematch(self, data):
        return self._matcher_fn(data)

# Usage
m = Margins(fit, matching=MyMatcher(matched_df, pair_ids, my_match_fn))

Pickling and parallelism¶

PysmatchClient must be picklable for process-based parallel bootstrap (n_jobs > 1). If the underlying matcher stores non-picklable objects, set n_jobs=1 in the bootstrap configuration.

Vcov behavior summary¶

User provides	Behavior
`matching` only	Auto-derives `vcov = {"type": "cluster", "groups": matching.cluster_ids}`
`matching` + `vcov="cluster"`	Uses user-supplied cluster specification
`matching` + `vcov="HC3"`	User override; warning emitted because matched-set dependence is ignored
`matching` + `vcov=ndarray`	User-supplied Σ̂ used directly; no validation against cluster structure

Matching support¶

The `MatchingClient` protocol¶

Basic workflow with `pysmatch`¶

Why `rematch`?¶

Sensitivity analysis¶

Inference after matching¶

Bootstrap with rematching¶

Custom matching wrappers¶

Pickling and parallelism¶

Vcov behavior summary¶

pymargins

Navigation

Related Topics

Matching support¶

The MatchingClient protocol¶

Basic workflow with pysmatch¶

Why rematch?¶

Sensitivity analysis¶

Inference after matching¶

Bootstrap with rematching¶

Custom matching wrappers¶

Pickling and parallelism¶

Vcov behavior summary¶

The `MatchingClient` protocol¶

Basic workflow with `pysmatch`¶

Why `rematch`?¶