OLS — linear regression¶

For OLS the response scale and the linear predictor scale coincide; the natural session is Margins.linear_scale(...).

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf

from pymargins import Margins

rng = np.random.default_rng(11)
n = 3000
df = pd.DataFrame({
    "age": rng.integers(20, 75, n),
    "female": rng.binomial(1, 0.5, n),
    "education": rng.choice(["hs", "college", "grad"], size=n, p=[.5, .35, .15]),
})
df["wage"] = (
    10 + 0.3 * df["age"] - 2.0 * df["female"]
    + 4 * (df["education"] == "college") + 8 * (df["education"] == "grad")
    + rng.normal(0, 5, n)
)

fit = smf.ols("wage ~ age + C(female) + C(education) + age:C(female)",
              data=df).fit()

AME of `age` overall¶

m = Margins.linear_scale(fit, vcov="HC2", at="overall")
print(m.dydx("age").summary())

=========================================================
            Margins Result (delta, level=0.95)           
=========================================================
     estimate  std err         z  P>|z|  [95% Conf. Int.]
---------------------------------------------------------
age    0.3000   0.0004  713.9780  0.000    0.2992, 0.3008
=========================================================

n = 3000
κ: 0.000
Delta-vs-sim disagreement: 4.189%

AME of `age` by sex¶

print(m.dydx("age", atexog={"female": [0, 1]}).summary())

===================================================================
                 Margins Result (delta, level=0.95)                
===================================================================
                    estimate  std err    z  P>|z|  [95% Conf. Int.]
-------------------------------------------------------------------
female=[0, 1], age    0.2998   0.0002  inf  0.000    0.2994, 0.3002
===================================================================

n = 3000
κ: 0.000
Delta-vs-sim disagreement: 4.565%

Predicted wage at representative education levels¶

print(m.predict(atexog={"education": ["hs", "college", "grad"]}).summary())

=======================================================================
                   Margins Result (delta, level=0.95)                  
=======================================================================
                   estimate  std err         z  P>|z|  [95% Conf. Int.]
-----------------------------------------------------------------------
education=hs        23.1750   0.1279  181.2105  0.000  22.9244, 23.4257
education=college   26.8908   0.1539  174.7156  0.000  26.5892, 27.1925
education=grad      31.1735   0.2514  124.0111  0.000  30.6808, 31.6662
=======================================================================

n = 3000
κ: max=0.000
Delta-vs-sim disagreement: 0.082%

Plot: predicted wage by education¶

import matplotlib.pyplot as plt

res = m.predict(atexog={"education": ["hs", "college", "grad"]})
df_plot = res.to_frame()

fig, ax = plt.subplots(figsize=(6, 4))
ax.bar(df_plot["education"], df_plot["estimate"],
       yerr=[df_plot["estimate"] - df_plot["ci_lower"],
             df_plot["ci_upper"] - df_plot["estimate"]],
       capsize=4, color="coral", edgecolor="black")
ax.set(xlabel="Education", ylabel="Predicted wage")

[Text(0.5, 0, 'Education'), Text(0, 0.5, 'Predicted wage')]

../_images/3126733ec4f18575e70b4bcad0b5b3e20b7aa613ab94189455a5ef445d47b3c6.png

A pairwise wage gap¶

from pymargins import pairwise

scen, w = pairwise("education", ["grad", "hs"])
print(m.contrasts(scenarios=scen, contrasts=w).summary())

===================================================================
                 Margins Result (delta, level=0.95)                
===================================================================
                estimate  std err        z  P>|z|  [95% Conf. Int.]
-------------------------------------------------------------------
education=grad    7.9984   0.2820  28.3607  0.000    7.4457, 8.5512
===================================================================

n = 3000
κ: 0.000
Delta-vs-sim disagreement: 0.396%