Contrasts vs `evaluate` — choosing the right tool¶

pymargins offers two ways to combine scenario predictions:

Tool	What it does	When to use
`contrasts`	Linear combination on the inference scale	Risk differences, ratios (via log scale), odds ratios, lift, DiD
`evaluate`	Nonlinear composition on the response scale	Raw ratios, NNT, reciprocals, custom utility functions

The rule of thumb is simple: if your estimand can be written as a weighted sum, use contrasts. If it cannot, use evaluate.

The mathematical distinction¶

contrasts:

result = φ( Σᵢ wᵢ · φ⁻¹(pᵢ) )

The weights wᵢ are fixed scalars. The combination happens on the inference scale, before back-transformation. Because the operation is linear in φ⁻¹(p), the delta method is exact for the combination step itself — curvature only enters through the individual predictions.

evaluate:

result = φ( φ⁻¹( compose(p₁, p₂, …, p_k) ) )

compose is an arbitrary function of the response-scale predictions. The combination happens on the response scale, and phi_inv is applied to the output of compose. Nonlinearity in compose propagates directly into the delta-method Jacobian, which usually produces a larger κ and a wider CI.

Decision flowchart¶

Is the estimand a weighted sum of predictions?
├─ Yes ─────────────────────────────► contrasts
│   (risk diff, log-ratio, odds ratio, DiD, etc.)
│
└─ No ──────────────────────────────► evaluate
    (raw ratio, NNT, reciprocal, nested nonlinear, custom utility)

Side-by-side: the same ratio two ways¶

Preferred: ratio via `contrasts` on log scale¶

m = Margins.log_scale(fit, at="overall")
scen, w = pairwise("treated", [1, 0])

res = m.contrasts(scenarios=scen, contrasts=w)

Inference scale: log(p₁) − log(p₀)
Delta method: exact for the combination step
κ: usually small
Audit trail: [+1, −1] weight is explicit
Reporting: back-transformed to p₁ / p₀

Fallback: ratio via `evaluate` on linear scale¶

m = Margins.linear_scale(fit, at="overall")

res = m.evaluate(
    scenarios=scen,
    compose=lambda p: p[0] / p[1],
)

Inference scale: p₁ / p₀ directly
Delta method: approximate (ratio is nonlinear)
κ: usually larger
Audit trail: compose function must be inspected to know what was computed
Reporting: same point estimate, different (often wider) CI

Use the evaluate version only when your field or journal requires inference on the raw ratio scale rather than the log-ratio scale.

When `contrasts` is strictly better¶

Estimand	Scale	Why `contrasts` wins
Risk difference	`linear_scale`	`w = [+1, −1]` is exact; no curvature from a ratio
Risk ratio	`log_scale`	`log(p₁) − log(p₀)` is linear; delta is exact
Odds ratio	`logit_scale`	`logit(p₁) − logit(p₀)` is linear
Lift (RR − 1)	`log_scale`	Compute RR with `contrasts`, subtract 1
DiD	`linear_scale`	Four-cell `[+1, −1, −1, +1]` is linear
Reference contrasts	any	Weight matrix is linear

When `evaluate` is required¶

Estimand	Why `evaluate` is needed
Raw ratio on linear scale	`p₁ / p₀` cannot be written as `Σ wᵢ · φ⁻¹(pᵢ)` for any standard `φ`
Number needed to treat	`1 / (p₁ − p₀)` is a reciprocal, not a weighted sum
Emax-style parameter	`(high − placebo) / (low − placebo)` is a ratio of differences
Custom utility	`√p₁ − √p₂` or any bespoke function
Ratio of ratios	`(p₁/p₀) / (p₃/p₂)` is nested nonlinear

Composability and audit trail¶

contrasts produces an audit-friendly result because the weight vector is stored in estimand_metadata and visible in summary(). A reviewer reads [+1, −1] and knows exactly what was computed.

evaluate buries the logic inside a compose callable. The result records that evaluate was used, but the reviewer must inspect the source code to verify the formula. This is acceptable for complex custom estimands, but it is a reason to prefer contrasts whenever the two approaches agree numerically.

Performance¶

contrasts uses a dedicated fast path for linear combinations: one Jacobian evaluation per scenario, then matrix multiplication for the weights. evaluate differentiates through compose, which adds overhead and can force an auto-route to simulation if compose is not JAX-differentiable.

Summary¶

Start with contrasts. If your estimand is a difference, ratio (via log scale), odds ratio (via logit scale), or DiD, contrasts is faster, more transparent, and usually more accurate.
Reach for evaluate only when the estimand is genuinely nonlinear: reciprocals, raw-scale ratios, nested ratios, or custom utility functions.
When in doubt, try both. If the point estimates agree and the contrasts CI is narrower with a smaller κ, the contrast is the better tool.

Contrasts vs `evaluate` — choosing the right tool¶

The mathematical distinction¶

Decision flowchart¶

Side-by-side: the same ratio two ways¶

Preferred: ratio via `contrasts` on log scale¶

Fallback: ratio via `evaluate` on linear scale¶

When `contrasts` is strictly better¶

When `evaluate` is required¶

Composability and audit trail¶

Performance¶

Summary¶

pymargins

Navigation

Related Topics

Contrasts vs evaluate — choosing the right tool¶

The mathematical distinction¶

Decision flowchart¶

Side-by-side: the same ratio two ways¶

Preferred: ratio via contrasts on log scale¶

Fallback: ratio via evaluate on linear scale¶

When contrasts is strictly better¶

When evaluate is required¶

Composability and audit trail¶

Performance¶

Summary¶

Contrasts vs `evaluate` — choosing the right tool¶

Preferred: ratio via `contrasts` on log scale¶

Fallback: ratio via `evaluate` on linear scale¶

When `contrasts` is strictly better¶

When `evaluate` is required¶