Wage panel — union premium with entity fixed effects¶
The wage_panel data from linearmodels follows 545 young men
across 1980–1987 with annual observations on log wages, education,
experience, union membership, and marital status. The substantive
question is the union wage premium — but ordinary OLS confounds it
with unobserved worker quality. An entity-FE specification absorbs
that.
This demo walks through:
An entity-FE specification with cluster-robust SEs.
The marginal union premium, with the FE-corrected interval.
A Krinsky–Robb simulation cross-check on the analytic clustered SE.
import numpy as np
import pandas as pd
from linearmodels.panel import PanelOLS
from linearmodels.datasets import wage_panel
from pymargins import Margins, pairwise
raw = wage_panel.load()
df = raw.set_index(["nr", "year"]).copy()
print(df[["lwage", "educ", "exper", "expersq",
"union", "married"]].describe().round(2))
lwage educ exper expersq union married
count 4360.00 4360.00 4360.00 4360.00 4360.00 4360.00
mean 1.65 11.77 6.51 50.42 0.24 0.44
std 0.53 1.75 2.83 40.78 0.43 0.50
min -3.58 3.00 0.00 0.00 0.00 0.00
25% 1.35 11.00 4.00 16.00 0.00 0.00
50% 1.67 12.00 6.00 36.00 0.00 0.00
75% 1.99 12.00 9.00 81.00 0.00 1.00
max 4.05 16.00 18.00 324.00 1.00 1.00
1. Entity-FE specification¶
fe = PanelOLS(
df["lwage"],
df[["exper", "expersq", "union", "married"]],
entity_effects=True,
).fit(cov_type="clustered", cluster_entity=True)
print(fe.summary.tables[1])
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
exper 0.1168 0.0107 10.917 0.0000 0.0959 0.1378
expersq -0.0043 0.0007 -6.2744 0.0000 -0.0056 -0.0030
union 0.0821 0.0228 3.5994 0.0003 0.0374 0.1268
married 0.0453 0.0210 2.1589 0.0309 0.0042 0.0864
==============================================================================
Within-worker variation only. The union coefficient now identifies the wage change for the same person when they enter or leave a union — that’s the policy quantity.
(educ is absorbed because it does not vary over time within a
worker.)
2. Marginal union premium on the wage scale¶
lwage is log wages and union is binary, so the natural estimand
is a contrast (union vs non-union) rather than a slope. On a
linear-scale session that contrast is the gap in log wages — i.e.,
the union premium expressed as a log-point gap:
m = Margins.linear_scale(fe, at="overall")
scen, w = pairwise("union", [1, 0])
print(m.contrasts(scenarios=scen, contrasts=w).summary())
===========================================================
Margins Result (delta, level=0.95)
===========================================================
estimate std err z P>|z| [95% Conf. Int.]
-----------------------------------------------------------
union=1 0.0821 0.0228 3.5994 0.000 0.0374, 0.1268
===========================================================
n = 4360
κ: 0.000
Delta-vs-sim disagreement: 6.569%
For policy reporting it’s clearer to express this as a percentage gap. Switch to a log session and ask for the same contrast — the back-transform turns the log-difference into a multiplicative premium:
m_log = Margins.log_scale(fe, at="overall")
print(m_log.contrasts(scenarios=scen, contrasts=w).summary())
===========================================================
Margins Result (delta, level=0.95)
===========================================================
estimate std err z P>|z| [95% Conf. Int.]
-----------------------------------------------------------
union=1 1.1455 0.0371 3.6651 0.000 1.0652, 1.2318
===========================================================
n = 4360
Note: std err is on the inference scale; estimate and CI are on the reporting scale.
κ: 0.064
Delta-vs-sim disagreement: 0.199%
The point estimate on the log scale is the log-wage gap; the back-transformed interval is the multiplicative ratio (1.X means “X% premium”).
3. Krinsky–Robb simulation as a sanity check¶
The analytic cluster-robust SE assumes the within-cluster dependence
structure is well-approximated by the sandwich formula. With
moderate cluster counts (here, 545 workers) that’s usually fine —
but it’s worth a cross-check. The cluster-block bootstrap is the
most rigorous check; at the time of writing it is not yet wired up
for linearmodels panel adapters (re-fitting on resampled panels
needs special handling), so the practical alternative is Krinsky–Robb
simulation, which draws coefficient vectors from the fitted MVN and
re-evaluates the estimand:
m_sim = Margins.log_scale(
fe, at="overall",
method="simulation", n_sim=2000, rng_seed=0,
)
print(m_sim.contrasts(scenarios=scen, contrasts=w).summary())
==============================================================
Margins Result (simulation, level=0.95)
==============================================================
estimate std err statistic P>|z| [95% Conf. Int.]
--------------------------------------------------------------
union=1 1.1455 0.0364 0.1358 0.000 1.0671, 1.2296
==============================================================
n = 4360
Note: std err is on the inference scale; estimate and CI are on the reporting scale.
κ: 0.064
If the simulation CI is materially wider than the analytic CI on the
log scale, the response surface is curved enough that the delta
method’s linearization is starting to bite — the κ
diagnostic will usually have
flagged this already. For a within-worker linear model on lwage the
two methods agree closely.
Where to next¶
Panel data — fixed effects — the underlying panel tutorial.
Cluster and block bootstrap — when and why to prefer the cluster-resampled bootstrap over the analytic clustered SE.
Robust and clustered standard errors — the menu of clustered covariance estimators.