# Contrasts vs `evaluate` — choosing the right tool `pymargins` offers two ways to combine scenario predictions: | Tool | What it does | When to use | |------|--------------|-------------| | `contrasts` | **Linear** combination on the inference scale | Risk differences, ratios (via log scale), odds ratios, lift, DiD | | `evaluate` | **Nonlinear** composition on the response scale | Raw ratios, NNT, reciprocals, custom utility functions | The rule of thumb is simple: if your estimand can be written as a weighted sum, use `contrasts`. If it cannot, use `evaluate`. ## The mathematical distinction **`contrasts`**: ``` result = φ( Σᵢ wᵢ · φ⁻¹(pᵢ) ) ``` The weights `wᵢ` are fixed scalars. The combination happens **on the inference scale**, before back-transformation. Because the operation is linear in `φ⁻¹(p)`, the delta method is exact for the combination step itself — curvature only enters through the individual predictions. **`evaluate`**: ``` result = φ( φ⁻¹( compose(p₁, p₂, …, p_k) ) ) ``` `compose` is an arbitrary function of the response-scale predictions. The combination happens **on the response scale**, and `phi_inv` is applied to the *output* of `compose`. Nonlinearity in `compose` propagates directly into the delta-method Jacobian, which usually produces a larger κ and a wider CI. ## Decision flowchart ```text Is the estimand a weighted sum of predictions? ├─ Yes ─────────────────────────────► contrasts │ (risk diff, log-ratio, odds ratio, DiD, etc.) │ └─ No ──────────────────────────────► evaluate (raw ratio, NNT, reciprocal, nested nonlinear, custom utility) ``` ## Side-by-side: the same ratio two ways ### Preferred: ratio via `contrasts` on log scale ```python m = Margins.log_scale(fit, at="overall") scen, w = pairwise("treated", [1, 0]) res = m.contrasts(scenarios=scen, contrasts=w) ``` - Inference scale: `log(p₁) − log(p₀)` - Delta method: exact for the combination step - κ: usually small - Audit trail: `[+1, −1]` weight is explicit - Reporting: back-transformed to `p₁ / p₀` ### Fallback: ratio via `evaluate` on linear scale ```python m = Margins.linear_scale(fit, at="overall") res = m.evaluate( scenarios=scen, compose=lambda p: p[0] / p[1], ) ``` - Inference scale: `p₁ / p₀` directly - Delta method: approximate (ratio is nonlinear) - κ: usually larger - Audit trail: `compose` function must be inspected to know what was computed - Reporting: same point estimate, different (often wider) CI Use the `evaluate` version only when your field or journal requires inference on the **raw ratio scale** rather than the log-ratio scale. ## When `contrasts` is strictly better | Estimand | Scale | Why `contrasts` wins | |----------|-------|----------------------| | Risk difference | `linear_scale` | `w = [+1, −1]` is exact; no curvature from a ratio | | Risk ratio | `log_scale` | `log(p₁) − log(p₀)` is linear; delta is exact | | Odds ratio | `logit_scale` | `logit(p₁) − logit(p₀)` is linear | | Lift (RR − 1) | `log_scale` | Compute RR with `contrasts`, subtract 1 | | DiD | `linear_scale` | Four-cell `[+1, −1, −1, +1]` is linear | | Reference contrasts | any | Weight matrix is linear | ## When `evaluate` is required | Estimand | Why `evaluate` is needed | |----------|--------------------------| | Raw ratio on linear scale | `p₁ / p₀` cannot be written as `Σ wᵢ · φ⁻¹(pᵢ)` for any standard `φ` | | Number needed to treat | `1 / (p₁ − p₀)` is a reciprocal, not a weighted sum | | Emax-style parameter | `(high − placebo) / (low − placebo)` is a ratio of differences | | Custom utility | `√p₁ − √p₂` or any bespoke function | | Ratio of ratios | `(p₁/p₀) / (p₃/p₂)` is nested nonlinear | ## Composability and audit trail `contrasts` produces an audit-friendly result because the weight vector is stored in `estimand_metadata` and visible in `summary()`. A reviewer reads `[+1, −1]` and knows exactly what was computed. `evaluate` buries the logic inside a `compose` callable. The result records that `evaluate` was used, but the reviewer must inspect the source code to verify the formula. This is acceptable for complex custom estimands, but it is a reason to prefer `contrasts` whenever the two approaches agree numerically. ## Performance `contrasts` uses a dedicated fast path for linear combinations: one Jacobian evaluation per scenario, then matrix multiplication for the weights. `evaluate` differentiates through `compose`, which adds overhead and can force an auto-route to simulation if `compose` is not JAX-differentiable. ## Summary 1. **Start with `contrasts`**. If your estimand is a difference, ratio (via log scale), odds ratio (via logit scale), or DiD, `contrasts` is faster, more transparent, and usually more accurate. 2. **Reach for `evaluate`** only when the estimand is genuinely nonlinear: reciprocals, raw-scale ratios, nested ratios, or custom utility functions. 3. **When in doubt**, try both. If the point estimates agree and the `contrasts` CI is narrower with a smaller κ, the contrast is the better tool.