---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

# Fair (1978) — Extramarital affairs, two ways

The Fair affairs survey records the *number* of extramarital
encounters reported by 6,366 respondents, alongside age, years
married, religiousness, education, and a marriage-rating scale. The
same dataset supports two distinct analyses:

- a **logit** on `had_affair = affairs > 0` — the binary question, "is
  this respondent at risk of any affair at all?"
- a **Poisson** on `affairs` itself — the count question, "how many
  encounters do we expect from a respondent at this profile?"

Both are useful. The point of this demo is to run them through one
`pymargins` session each and contrast how they answer the same
substantive questions on different scales.

```{code-cell} python
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

from pymargins import Margins, pairwise

raw = sm.datasets.fair.load_pandas().data.copy()
raw["had_affair"] = (raw["affairs"] > 0).astype(int)
print(raw[["had_affair", "affairs", "age", "yrs_married",
           "religious", "rate_marriage"]].describe().round(2))
```

## 1. Logit on the binary outcome

```{code-cell} python
logit = smf.glm(
    "had_affair ~ age + yrs_married + religious + rate_marriage + educ",
    data=raw,
    family=sm.families.Binomial(),
).fit()

m_logit = Margins.linear_scale(logit, vcov="HC3", at="overall")
print(m_logit.dydx(["age", "yrs_married", "religious",
                    "rate_marriage", "educ"]).summary())
```

The AMEs are reported as changes in the *probability of any affair*
per unit of each covariate, averaged over the observed sample. The
signs match intuition: longer marriages and lower marriage ratings
raise the probability, religiousness lowers it.

## 2. Poisson on the count

```{code-cell} python
pois = smf.glm(
    "affairs ~ age + yrs_married + religious + rate_marriage + educ",
    data=raw,
    family=sm.families.Poisson(),
).fit()

m_pois = Margins.linear_scale(pois, vcov="HC3", at="overall")
print(m_pois.dydx(["age", "yrs_married", "religious",
                   "rate_marriage", "educ"]).summary())
```

These AMEs are the change in the *expected count* per unit. The
ranking of effects is similar to the logit; the magnitudes are not
comparable because they live on different response scales.

## 3. Marriage rating as a discrete contrast

`rate_marriage` is recorded on a 1–5 ordinal scale. The natural
policy question is "what's the gap between a 'very happy' marriage
(5) and a 'very unhappy' marriage (1)?" — a contrast, not a slope:

```{code-cell} python
scen, w = pairwise("rate_marriage", [5, 1])
print("Logit (probability scale):")
print(m_logit.contrasts(scenarios=scen, contrasts=w).summary())
print("\nPoisson (count scale):")
print(m_pois.contrasts(scenarios=scen, contrasts=w).summary())
```

The logit contrast says: averaging across the sample, a respondent
rating their marriage 5 has *X percentage points* lower probability
of any affair than the same respondent at rating 1. The Poisson
contrast gives the corresponding gap in expected count.

## 4. Scale matters — log vs linear inference for the Poisson

The Poisson canonical link is log. Opening the session on the log
scale gives a *rate ratio* interpretation with an asymmetric CI that
is multiplicative on the count scale, which is usually what readers
expect for count models:

```{code-cell} python
m_pois_log = Margins.log_scale(pois, vcov="HC3", at="overall")
rr = m_pois_log.contrasts(scenarios=scen, contrasts=w)
print(rr.summary())
```

The point estimate here is *log(rate ratio)*; the back-transformed
CI is the rate ratio with an asymmetric interval. See
[](../explanations/inference_scale.md) for the menu of scales.

The κ diagnostic checks whether the estimand surface is well
linearized on the chosen scale:

```{code-cell} python
print(m_pois_log.diagnose().summary())
```

A small κ confirms the delta method is safe on the log scale; a
large κ would trigger an automatic fallback to simulation.

## 5. Predicted profile over years married

```{code-cell} python
import matplotlib.pyplot as plt

years = list(range(0, 25))
res_l = m_logit.predict(atexog={"yrs_married": years})
res_p = m_pois.predict(atexog={"yrs_married": years})

fig, axes = plt.subplots(1, 2, figsize=(10, 4), sharex=True)
for ax, res, ylabel in [
    (axes[0], res_l, "P(any affair)"),
    (axes[1], res_p, "E(affairs)"),
]:
    d = res.to_frame()
    ax.plot(d["yrs_married"], d["estimate"], color="firebrick")
    ax.fill_between(
        d["yrs_married"], d["ci_lower"], d["ci_upper"],
        alpha=0.2, color="firebrick",
    )
    ax.set(xlabel="Years married", ylabel=ylabel)
fig.suptitle("Same dataset, two estimands")
fig.tight_layout()
```

The shapes are similar but not identical — the logit saturates as
`yrs_married` grows, the Poisson does not. Which curve to report
depends on whether the audience cares about *who* has affairs or
*how many*.

## Where to next

- [](../tutorials/glm_poisson.md) — Poisson tutorial with offsets.
- [](../explanations/inference_scale.md) — log vs linear vs logit
  scale and what each one buys you.
- [](mroz_lfp.md) — same model family, focused on the AME-vs-MEM
  question.