Wage panel — union premium with entity fixed effects

The wage_panel data from linearmodels follows 545 young men across 1980–1987 with annual observations on log wages, education, experience, union membership, and marital status. The substantive question is the union wage premium — but ordinary OLS confounds it with unobserved worker quality. An entity-FE specification absorbs that.

This demo walks through:

  1. An entity-FE specification with cluster-robust SEs.

  2. The marginal union premium, with the FE-corrected interval.

  3. A Krinsky–Robb simulation cross-check on the analytic clustered SE.

import numpy as np
import pandas as pd
from linearmodels.panel import PanelOLS
from linearmodels.datasets import wage_panel

from pymargins import Margins, pairwise

raw = wage_panel.load()
df = raw.set_index(["nr", "year"]).copy()
print(df[["lwage", "educ", "exper", "expersq",
          "union", "married"]].describe().round(2))
         lwage     educ    exper  expersq    union  married
count  4360.00  4360.00  4360.00  4360.00  4360.00  4360.00
mean      1.65    11.77     6.51    50.42     0.24     0.44
std       0.53     1.75     2.83    40.78     0.43     0.50
min      -3.58     3.00     0.00     0.00     0.00     0.00
25%       1.35    11.00     4.00    16.00     0.00     0.00
50%       1.67    12.00     6.00    36.00     0.00     0.00
75%       1.99    12.00     9.00    81.00     0.00     1.00
max       4.05    16.00    18.00   324.00     1.00     1.00

1. Entity-FE specification

fe = PanelOLS(
    df["lwage"],
    df[["exper", "expersq", "union", "married"]],
    entity_effects=True,
).fit(cov_type="clustered", cluster_entity=True)
print(fe.summary.tables[1])
                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
exper          0.1168     0.0107     10.917     0.0000      0.0959      0.1378
expersq       -0.0043     0.0007    -6.2744     0.0000     -0.0056     -0.0030
union          0.0821     0.0228     3.5994     0.0003      0.0374      0.1268
married        0.0453     0.0210     2.1589     0.0309      0.0042      0.0864
==============================================================================

Within-worker variation only. The union coefficient now identifies the wage change for the same person when they enter or leave a union — that’s the policy quantity.

(educ is absorbed because it does not vary over time within a worker.)

2. Marginal union premium on the wage scale

lwage is log wages and union is binary, so the natural estimand is a contrast (union vs non-union) rather than a slope. On a linear-scale session that contrast is the gap in log wages — i.e., the union premium expressed as a log-point gap:

m = Margins.linear_scale(fe, at="overall")
scen, w = pairwise("union", [1, 0])
print(m.contrasts(scenarios=scen, contrasts=w).summary())
===========================================================
             Margins Result (delta, level=0.95)            
===========================================================
         estimate  std err       z  P>|z|  [95% Conf. Int.]
-----------------------------------------------------------
union=1    0.0821   0.0228  3.5994  0.000    0.0374, 0.1268
===========================================================

n = 4360
κ: 0.000
Delta-vs-sim disagreement: 6.569%

For policy reporting it’s clearer to express this as a percentage gap. Switch to a log session and ask for the same contrast — the back-transform turns the log-difference into a multiplicative premium:

m_log = Margins.log_scale(fe, at="overall")
print(m_log.contrasts(scenarios=scen, contrasts=w).summary())
===========================================================
             Margins Result (delta, level=0.95)            
===========================================================
         estimate  std err       z  P>|z|  [95% Conf. Int.]
-----------------------------------------------------------
union=1    1.1455   0.0371  3.6651  0.000    1.0652, 1.2318
===========================================================

n = 4360
Note: std err is on the inference scale; estimate and CI are on the reporting scale.
κ: 0.064
Delta-vs-sim disagreement: 0.199%

The point estimate on the log scale is the log-wage gap; the back-transformed interval is the multiplicative ratio (1.X means “X% premium”).

3. Krinsky–Robb simulation as a sanity check

The analytic cluster-robust SE assumes the within-cluster dependence structure is well-approximated by the sandwich formula. With moderate cluster counts (here, 545 workers) that’s usually fine — but it’s worth a cross-check. The cluster-block bootstrap is the most rigorous check; at the time of writing it is not yet wired up for linearmodels panel adapters (re-fitting on resampled panels needs special handling), so the practical alternative is Krinsky–Robb simulation, which draws coefficient vectors from the fitted MVN and re-evaluates the estimand:

m_sim = Margins.log_scale(
    fe, at="overall",
    method="simulation", n_sim=2000, rng_seed=0,
)
print(m_sim.contrasts(scenarios=scen, contrasts=w).summary())
==============================================================
           Margins Result (simulation, level=0.95)            
==============================================================
         estimate  std err  statistic  P>|z|  [95% Conf. Int.]
--------------------------------------------------------------
union=1    1.1455   0.0364     0.1358  0.000    1.0671, 1.2296
==============================================================

n = 4360
Note: std err is on the inference scale; estimate and CI are on the reporting scale.
κ: 0.064

If the simulation CI is materially wider than the analytic CI on the log scale, the response surface is curved enough that the delta method’s linearization is starting to bite — the κ diagnostic will usually have flagged this already. For a within-worker linear model on lwage the two methods agree closely.

Where to next