GLM — Poisson count

A Poisson regression for count data. The natural reporting scale is the rate ratio (log link → exp); pymargins makes that explicit through Margins.log_scale.

import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf

from pymargins import Margins

rng = np.random.default_rng(7)
n = 4000
df = pd.DataFrame({
    "exposure": rng.uniform(1, 5, n),
    "treated": rng.binomial(1, 0.4, n),
    "age": rng.integers(20, 75, n),
})
lp = -2.0 + 0.6 * df["treated"] + 0.02 * df["age"]
df["events"] = rng.poisson(df["exposure"] * np.exp(lp))

fit = smf.glm(
    "events ~ treated + age",
    data=df,
    family=sm.families.Poisson(),
    offset=np.log(df["exposure"]),
).fit()

Rate ratio for treatment

The contrast treated=1 vs treated=0 on the log scale is a log-RR; the session back-transforms to a rate ratio with an asymmetric CI.

m = Margins.log_scale(fit, vcov="HC1", at="overall")
print(m.contrasts(
    scenarios=[
        {"atexog": {"treated": 1}, "label": "treated"},
        {"atexog": {"treated": 0}, "label": "control"},
    ],
    contrasts=[+1, -1],
).summary())
============================================================
             Margins Result (delta, level=0.95)             
============================================================
         estimate  std err        z  P>|z|  [95% Conf. Int.]
------------------------------------------------------------
treated    1.8059   0.0289  20.4281  0.000    1.7063, 1.9112
============================================================

n = 4000
Note: std err is on the inference scale; estimate and CI are on the reporting scale.
κ: 0.000
Delta-vs-sim disagreement: 0.008%

Predicted counts by age

print(m.predict(atexog={"age": [25, 45, 65]}).summary())
============================================================
             Margins Result (delta, level=0.95)             
============================================================
        estimate  std err         z  P>|z|  [95% Conf. Int.]
------------------------------------------------------------
age=25    0.3050   0.0287  -41.3812  0.000    0.2883, 0.3227
age=45    0.4531   0.0157  -50.3675  0.000    0.4394, 0.4673
age=65    0.6731   0.0192  -20.5848  0.000    0.6482, 0.6989
============================================================

n = 4000
Note: std err is on the inference scale; estimate and CI are on the reporting scale.
κ: max=0.013
Delta-vs-sim disagreement: 0.426%

Plot: predicted event rate by age

import matplotlib.pyplot as plt

res = m.predict(atexog={"age": [25, 35, 45, 55, 65]})
df_plot = res.to_frame()

fig, ax = plt.subplots(figsize=(6, 4))
ax.plot(df_plot["age"], df_plot["estimate"], marker="o", color="darkgreen")
ax.fill_between(
    df_plot["age"], df_plot["ci_lower"], df_plot["ci_upper"],
    alpha=0.2, color="darkgreen"
)
ax.set(xlabel="Age", ylabel="Predicted event count")
[Text(0.5, 0, 'Age'), Text(0, 0.5, 'Predicted event count')]
../_images/b62927f482ae2ea4986705ebf6fc7280a19f5aba9dddd2e4b13058a4852f6edd.png

Average marginal effect of age on rate

print(Margins.linear_scale(fit, at="overall").dydx("age").summary())
========================================================
           Margins Result (delta, level=0.95)           
========================================================
     estimate  std err        z  P>|z|  [95% Conf. Int.]
--------------------------------------------------------
age    0.0099   0.0004  26.4099  0.000    0.0091, 0.0106
========================================================

n = 4000
κ: 0.025
Delta-vs-sim disagreement: 1.438%