California API — stratified survey design¶
The California Academic Performance Index (API) measures school achievement.
The California Department of Education publishes the full population (apipop,
6 194 schools) and several survey samples drawn from it. Here we analyze
apistrat, a stratified sample of 200 schools where strata are school type
(stype: Elementary / Middle / High) and sampling weights (pw) correct for
unequal selection probabilities within each stratum.
This demo shows:
Loading the stratified sample and its design variables.
Fitting a weighted model for 2000 API scores (Stata
svy: regstyle).Computing survey-adjusted average marginal effects and standard errors via Taylor linearization.
A design-based bootstrap cross-check.
A policy-relevant contrast: the expected API gap between low- and high-poverty schools.
Validating the survey estimate against the known population truth.
import jax
jax.config.update("jax_enable_x64", True)
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from pymargins import Margins, SurveyDesign
# Load the stratified sample generated from R's survey package
apistrat = pd.read_csv("data/apistrat.csv")
print(apistrat.shape)
print(apistrat[["stype", "api00", "api99", "meals", "ell", "avg.ed",
"mobility", "pw", "fpc"]].head())
(200, 39)
stype api00 api99 meals ell avg.ed mobility pw fpc
0 E 840 816 33 25 3.32 11 44.209999 4421
1 E 516 476 98 77 1.67 26 44.209999 4421
2 E 531 544 64 23 2.34 17 44.209999 4421
3 E 501 457 83 63 1.86 13 44.209999 4421
4 E 720 659 26 17 3.17 31 44.209999 4421
1. Explore the survey design¶
print("Schools per stratum:")
print(apistrat["stype"].value_counts().sort_index())
print("\nWeight range: {:.1f} – {:.1f}".format(apistrat["pw"].min(),
apistrat["pw"].max()))
print("FPC range: {:.1f} – {:.1f}".format(apistrat["fpc"].min(),
apistrat["fpc"].max()))
Schools per stratum:
stype
E 100
H 50
M 50
Name: count, dtype: int64
Weight range: 15.1 – 44.2
FPC range: 755.0 – 4421.0
The weights differ by roughly a factor of three because elementary, middle, and high schools were sampled at different rates within their strata.
2. Fit a weighted model¶
We model the 2000 API score as a function of:
meals— percent of students eligible for subsidized meals (poverty proxy)ell— percent of English-language learnersavg.ed— average parent education (in years)mobility— percent of students who move in/out during the year
Because the sample is stratified with unequal probabilities, we fit with
sampling weights so the coefficients generalize to the population:
fit = smf.glm(
"api00 ~ meals + ell + Q(\"avg.ed\") + mobility",
data=apistrat,
freq_weights=apistrat["pw"].values,
).fit()
print(fit.summary().tables[1])
===============================================================================
coef std err z P>|z| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept 532.8152 8.719 61.108 0.000 515.726 549.905
meals -1.8421 0.058 -31.534 0.000 -1.957 -1.728
ell -0.0022 0.062 -0.036 0.972 -0.123 0.119
Q("avg.ed") 77.3537 2.285 33.851 0.000 72.875 81.832
mobility 0.1594 0.076 2.110 0.035 0.011 0.307
===============================================================================
Note that, conditional on parent education and poverty, the English-learner
share (ell) carries almost no independent signal — its coefficient is small
and statistically indistinguishable from zero. Poverty (meals) is the
dominant predictor, so that is the marginal effect we focus on below.
3. Declare the survey design and compute AMEs¶
SurveyDesign receives the weights, the stratum labels, and the
finite-population correction. Here the schools are drawn directly from strata
without an intermediate cluster stage, so we omit psu. The fpc column holds
each stratum’s population size, which tightens the SE to reflect that we
sampled a non-trivial fraction of every stratum:
# stype is 'E', 'H', 'M' — encode as integers for SurveyDesign
stratum_codes = apistrat["stype"].astype("category").cat.codes
survey = SurveyDesign(
weights=apistrat["pw"].values,
strata=stratum_codes.values,
fpc=apistrat["fpc"].values,
)
m = Margins(fit, survey_design=survey,
weights=apistrat["pw"].values, at="overall")
print(m.dydx("meals").summary())
==========================================================
Margins Result (delta, level=0.95)
==========================================================
estimate std err z P>|z| [95% Conf. Int.]
----------------------------------------------------------
meals -1.8421 0.4139 -4.4503 0.000 -2.6534, -1.0308
==========================================================
n = 200
κ: 0.000
Delta-vs-sim disagreement: 1.657%
The point estimate is a population-weighted AME and the standard error is survey-adjusted — it accounts for the stratified sampling design via Taylor linearization.
4. Design-based bootstrap cross-check¶
A stratified bootstrap resamples schools within each stratum. With enough replicates it should give a standard error close to the linearization one:
m_boot = Margins(
fit,
survey_design=survey,
weights=apistrat["pw"].values,
at="overall",
method="bootstrap",
n_boot=500,
rng_seed=42,
)
print(m_boot.dydx("meals").summary())
============================================================
Margins Result (bootstrap, level=0.95)
============================================================
estimate std err statistic P>|z| [95% Conf. Int.]
------------------------------------------------------------
meals -1.8421 0.4409 -1.8421 0.008 -2.1456, -0.4205
============================================================
n = 200
κ: 0.000
5. Side-by-side comparison¶
results = pd.DataFrame({
"approach": ["survey linearization", "survey bootstrap"],
"estimate": [
float(m.dydx("meals").estimate),
float(m_boot.dydx("meals").estimate),
],
"std_error": [
float(m.dydx("meals").std_error),
float(m_boot.dydx("meals").std_error),
],
})
print(results.round(4))
approach estimate std_error
0 survey linearization -1.8421 0.4139
1 survey bootstrap -1.8421 0.4409
Both approaches agree: each additional percentage point of students on subsidized meals is associated with roughly 1.8 fewer API points, with a standard error around 0.4.
6. Policy contrast: low- vs high-poverty schools¶
What is the expected API gap between a low-poverty school (20 % of students on subsidized meals) and a high-poverty school (80 %), holding the other covariates at their (population-weighted) observed values?
from pymargins import pairwise
scen, w = pairwise("meals", [20, 80])
res = m.contrasts(scenarios=scen, contrasts=w)
print(res.summary())
=============================================================
Margins Result (delta, level=0.95)
=============================================================
estimate std err z P>|z| [95% Conf. Int.]
-------------------------------------------------------------
meals=20 110.5286 24.8362 4.4503 0.000 61.8505, 159.2067
=============================================================
n = 200
κ: 0.000
Delta-vs-sim disagreement: 3.534%
The contrast is a difference in expected API scores — the low-poverty prediction minus the high-poverty one — not a slope. It is averaged over the weighted distribution of the other covariates, and its standard error is propagated through both the regression and the survey design. A 60-point swing in the poverty share maps to roughly a 110-point API gap.
7. Validate against the population truth¶
Because the full population apipop is published, we can check whether the
survey estimate actually recovers the population marginal effect. We fit the
same model on all 6 194 schools (no weights needed — this is the population)
and compare:
apipop = pd.read_csv("data/apipop.csv")
truth_fit = smf.glm(
"api00 ~ meals + ell + Q(\"avg.ed\") + mobility",
data=apipop.dropna(subset=["api00", "meals", "ell", "avg.ed", "mobility"]),
).fit()
m_truth = Margins(truth_fit, at="overall")
ame_survey = m.dydx("meals")
truth_value = float(m_truth.dydx("meals").estimate)
lo, hi = ame_survey.conf_int()
print(f"population truth AME(meals): {truth_value:7.3f}")
print(f"survey estimate: {float(ame_survey.estimate):7.3f}")
print(f"95% survey CI: [{float(lo):.3f}, {float(hi):.3f}]")
print(f"truth inside CI: {float(lo) <= truth_value <= float(hi)}")
population truth AME(meals): -1.641
survey estimate: -1.842
95% survey CI: [-2.653, -1.031]
truth inside CI: True
The 200-school survey estimate lands close to the population value, and the design-based confidence interval covers the truth — exactly what a correctly specified survey analysis should deliver.
Where to next¶
Survey designs — weights, strata, and clusters — the underlying tutorial, which uses this same California API data to walk through weights, strata, and clusters one at a time.
Robust and clustered standard errors — robust and clustered SEs for non-survey settings.