---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
kernelspec:
  display_name: Python 3
  language: python
  name: python3
---

# Wage panel — union premium with entity fixed effects

The `wage_panel` data from `linearmodels` follows 545 young men
across 1980–1987 with annual observations on log wages, education,
experience, union membership, and marital status. The substantive
question is the union wage premium — but ordinary OLS confounds it
with unobserved worker quality. An entity-FE specification absorbs
that.

This demo walks through:

1. An entity-FE specification with cluster-robust SEs.
2. The marginal union premium, with the FE-corrected interval.
3. A Krinsky–Robb simulation cross-check on the analytic clustered
   SE.

```{code-cell} python
import numpy as np
import pandas as pd
from linearmodels.panel import PanelOLS
from linearmodels.datasets import wage_panel

from pymargins import Margins, pairwise

raw = wage_panel.load()
df = raw.set_index(["nr", "year"]).copy()
print(df[["lwage", "educ", "exper", "expersq",
          "union", "married"]].describe().round(2))
```

## 1. Entity-FE specification

```{code-cell} python
fe = PanelOLS(
    df["lwage"],
    df[["exper", "expersq", "union", "married"]],
    entity_effects=True,
).fit(cov_type="clustered", cluster_entity=True)
print(fe.summary.tables[1])
```

Within-worker variation only. The union coefficient now identifies
the wage change *for the same person* when they enter or leave a
union — that's the policy quantity.

(`educ` is absorbed because it does not vary over time within a
worker.)

## 2. Marginal union premium on the wage scale

`lwage` is log wages and `union` is binary, so the natural estimand
is a *contrast* (union vs non-union) rather than a slope. On a
linear-scale session that contrast is the gap in log wages — i.e.,
the union premium expressed as a log-point gap:

```{code-cell} python
m = Margins.linear_scale(fe, at="overall")
scen, w = pairwise("union", [1, 0])
print(m.contrasts(scenarios=scen, contrasts=w).summary())
```

For policy reporting it's clearer to express this as a percentage
gap. Switch to a log session and ask for the same contrast — the
back-transform turns the log-difference into a multiplicative
premium:

```{code-cell} python
m_log = Margins.log_scale(fe, at="overall")
print(m_log.contrasts(scenarios=scen, contrasts=w).summary())
```

The point estimate on the log scale is the log-wage gap; the
back-transformed interval is the multiplicative ratio (1.X means
"X% premium").

## 3. Krinsky–Robb simulation as a sanity check

The analytic cluster-robust SE assumes the within-cluster dependence
structure is well-approximated by the sandwich formula. With
moderate cluster counts (here, 545 workers) that's usually fine —
but it's worth a cross-check. The cluster-block bootstrap is the
most rigorous check; at the time of writing it is not yet wired up
for `linearmodels` panel adapters (re-fitting on resampled panels
needs special handling), so the practical alternative is Krinsky–Robb
simulation, which draws coefficient vectors from the fitted MVN and
re-evaluates the estimand:

```{code-cell} python
m_sim = Margins.log_scale(
    fe, at="overall",
    method="simulation", n_sim=2000, rng_seed=0,
)
print(m_sim.contrasts(scenarios=scen, contrasts=w).summary())
```

If the simulation CI is materially wider than the analytic CI on the
log scale, the response surface is curved enough that the delta
method's linearization is starting to bite — the [κ
diagnostic](../explanations/kappa_diagnostic.md) will usually have
flagged this already. For a within-worker linear model on `lwage` the
two methods agree closely.

## Where to next

- [](../tutorials/panel_fe.md) — the underlying panel tutorial.
- [](../howto/cluster_block_bootstrap.md) — when and why to prefer
  the cluster-resampled bootstrap over the analytic clustered SE.
- [](../howto/robust_clustered_ses.md) — the menu of clustered
  covariance estimators.