GLM — Poisson count¶
A Poisson regression for count data. The natural reporting scale is
the rate ratio (log link → exp); pymargins makes that explicit
through Margins.log_scale.
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from pymargins import Margins
rng = np.random.default_rng(7)
n = 4000
df = pd.DataFrame({
"exposure": rng.uniform(1, 5, n),
"treated": rng.binomial(1, 0.4, n),
"age": rng.integers(20, 75, n),
})
lp = -2.0 + 0.6 * df["treated"] + 0.02 * df["age"]
df["events"] = rng.poisson(df["exposure"] * np.exp(lp))
fit = smf.glm(
"events ~ treated + age",
data=df,
family=sm.families.Poisson(),
offset=np.log(df["exposure"]),
).fit()
Rate ratio for treatment¶
The contrast treated=1 vs treated=0 on the log scale is a log-RR;
the session back-transforms to a rate ratio with an asymmetric CI.
m = Margins.log_scale(fit, vcov="HC1", at="overall")
print(m.contrasts(
scenarios=[
{"atexog": {"treated": 1}, "label": "treated"},
{"atexog": {"treated": 0}, "label": "control"},
],
contrasts=[+1, -1],
).summary())
============================================================
Margins Result (delta, level=0.95)
============================================================
estimate std err z P>|z| [95% Conf. Int.]
------------------------------------------------------------
treated 1.8059 0.0289 20.4281 0.000 1.7063, 1.9112
============================================================
n = 4000
Note: std err is on the inference scale; estimate and CI are on the reporting scale.
κ: 0.000
Delta-vs-sim disagreement: 0.008%
Predicted counts by age¶
print(m.predict(atexog={"age": [25, 45, 65]}).summary())
============================================================
Margins Result (delta, level=0.95)
============================================================
estimate std err z P>|z| [95% Conf. Int.]
------------------------------------------------------------
age=25 0.3050 0.0287 -41.3812 0.000 0.2883, 0.3227
age=45 0.4531 0.0157 -50.3675 0.000 0.4394, 0.4673
age=65 0.6731 0.0192 -20.5848 0.000 0.6482, 0.6989
============================================================
n = 4000
Note: std err is on the inference scale; estimate and CI are on the reporting scale.
κ: max=0.013
Delta-vs-sim disagreement: 0.426%
Plot: predicted event rate by age¶
import matplotlib.pyplot as plt
res = m.predict(atexog={"age": [25, 35, 45, 55, 65]})
df_plot = res.to_frame()
fig, ax = plt.subplots(figsize=(6, 4))
ax.plot(df_plot["age"], df_plot["estimate"], marker="o", color="darkgreen")
ax.fill_between(
df_plot["age"], df_plot["ci_lower"], df_plot["ci_upper"],
alpha=0.2, color="darkgreen"
)
ax.set(xlabel="Age", ylabel="Predicted event count")
[Text(0.5, 0, 'Age'), Text(0, 0.5, 'Predicted event count')]
Average marginal effect of age on rate¶
print(Margins.linear_scale(fit, at="overall").dydx("age").summary())
========================================================
Margins Result (delta, level=0.95)
========================================================
estimate std err z P>|z| [95% Conf. Int.]
--------------------------------------------------------
age 0.0099 0.0004 26.4099 0.000 0.0091, 0.0106
========================================================
n = 4000
κ: 0.025
Delta-vs-sim disagreement: 1.438%