---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

# Nonlinear estimands with `evaluate`


```{code-cell} python
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from pymargins import Margins

rng = np.random.default_rng(42)
n = 2000
df = pd.DataFrame({
    "age": rng.integers(20, 75, n),
    "treatment": rng.binomial(1, 0.40, n),
    "dose": rng.choice([0, 50, 100], n),
    "policy": rng.choice(["A", "B"], n),
})
lp = (-1.5 + 0.04 * df["age"] + 0.8 * df["treatment"]
      + 0.01 * df["dose"]
      + 0.3 * (df["policy"] == "B"))
df["y"] = rng.binomial(1, 1 / (1 + np.exp(-lp)))

fit = smf.glm("y ~ age + treatment + dose + C(policy)", data=df,
              family=sm.families.Binomial()).fit()
m = Margins.linear_scale(fit, at="overall")
```


`Margins.evaluate` is the escape hatch for estimands that cannot be written
as a weighted sum of scenario predictions.  Use it for reciprocals,
custom utility functions, ratios of differences, or any other
JAX-differentiable composition that is not linear in the predictions.

For simple differences, risk ratios, odds ratios, and
difference-in-differences, `contrasts` is the better tool — it is
faster, more transparent, and usually more accurate.  See
[](../howto/contrasts.md) for those recipes and
[](../howto/contrasts_vs_evaluate.md) for the full decision guide.

## How `evaluate` works

`evaluate` takes a list of **scenarios** and a **compose** function.
The engine:

1. Computes the response-scale prediction for each scenario.
2. Stacks them into a JAX array and passes them to `compose`.
3. Applies `phi_inv` to lift the result onto the inference scale.
4. Runs delta-method inference (or simulation/bootstrap if curvature is
   high or `compose` is not JAX-differentiable).
5. Back-transforms CI endpoints with `phi` for reporting.

Mathematically:

```
result = φ( φ⁻¹( compose(p₁, p₂, …, p_k) ) )
```

where `pᵢ` is the aggregated response-scale prediction for scenario `i`.

## Number needed to treat (NNT)

NNT is the reciprocal of the absolute risk reduction.  Because it is a
reciprocal, it cannot be written as a linear contrast and must go
through `evaluate`:

```{code-cell} python
from pymargins import Margins

m = Margins.linear_scale(fit, at="overall")

scenarios = [
    {"atexog": {"treatment": 1}, "label": "treated"},
    {"atexog": {"treatment": 0}, "label": "control"},
]

res = m.evaluate(
    scenarios=scenarios,
    compose=lambda p: 1.0 / (p[0] - p[1]),
)
print(res.summary())
```

If the risk difference crosses zero, the denominator can change sign
and κ will be large.  In that case `pymargins` auto-falls back to
simulation, which is the safe thing to do for a reciprocal.

## Raw ratio on the linear scale

A risk ratio is usually computed with `contrasts` on a `log_scale`
session (`log(p₁) − log(p₀)` back-transformed with `exp`).  That is the
preferred path because the log-ratio is linear and the delta method is
exact.

Use `evaluate` for the raw ratio `p₁ / p₀` only when your field or
journal explicitly requires inference on the ratio scale itself:

```{code-cell} python
m = Margins.linear_scale(fit, at="overall")

scenarios = [
    {"atexog": {"treatment": 1}, "label": "treated"},
    {"atexog": {"treatment": 0}, "label": "control"},
]

res = m.evaluate(
    scenarios=scenarios,
    compose=lambda p: p[0] / p[1],
)
print(res.summary())
```

Because the ratio is nonlinear on the linear scale, κ is usually larger
than for the log-scale contrast and the CI is wider.

## Ratio of differences (Emax-style)

When the estimand is a ratio in which the numerator and denominator are
themselves differences, `evaluate` is required:

```{code-cell} python
scenarios = [
    {"atexog": {"dose": 0}, "label": "placebo"},
    {"atexog": {"dose": 50}, "label": "low"},
    {"atexog": {"dose": 100}, "label": "high"},
]

# Emax-style parameter: (high − placebo) / (low − placebo)
res = m.evaluate(
    scenarios=scenarios,
    compose=lambda p: (p[2] - p[0]) / (p[1] - p[0]),
)
```

## Custom utility / welfare function

Suppose you have a utility function `u(p) = p**0.5` (a concave
transformation of a predicted probability) and you want the
expected utility difference between two policy regimes:

```{code-cell} python
import jax.numpy as jnp

m = Margins.linear_scale(fit, at="overall")

scenarios = [
    {"atexog": {"policy": "A"}, "label": "regime_A"},
    {"atexog": {"policy": "B"}, "label": "regime_B"},
]

res = m.evaluate(
    scenarios=scenarios,
    compose=lambda p: jnp.sqrt(p[0]) - jnp.sqrt(p[1]),
)
print(res.summary())
```

## When `evaluate` auto-routes to simulation

If `compose` uses Python control flow (`if`, `for`) on tracer values,
JAX cannot differentiate it and the delta method is impossible.
`pymargins` catches the error and silently reroutes to the session's
simulation or bootstrap method.  The result records the realized
method, so the audit trail is still complete.

To avoid auto-routing, write `compose` with JAX primitives:

| Instead of             | Use                              |
|------------------------|----------------------------------|
| `if x > 0: ...`        | `jnp.where(x > 0, ..., ...)`     |
| `x / y` (unsafe divide)| `jnp.where(y == 0, 0, x / y)`    |
| `max(a, b)`            | `jnp.maximum(a, b)`              |

## See also

- [](../howto/contrasts.md) for risk differences, log-ratios, odds
  ratios, and DiD.
- [](../howto/contrasts_vs_evaluate.md) for the decision flowchart.
- [](../howto/scenarios_helpers.md) for `pairwise`, `reference`, `grid`,
  etc.
- [](../math.rst) for the delta-method derivation.