--- jupytext: text_representation: extension: .md format_name: myst format_version: 0.13 kernelspec: display_name: Python 3 language: python name: python3 --- # California API — stratified survey design The California Academic Performance Index (API) measures school achievement. The California Department of Education publishes the full population (`apipop`, 6 194 schools) and several survey samples drawn from it. Here we analyze **`apistrat`**, a stratified sample of 200 schools where strata are school type (`stype`: Elementary / Middle / High) and sampling weights (`pw`) correct for unequal selection probabilities within each stratum. This demo shows: 1. Loading the stratified sample and its design variables. 2. Fitting a **weighted** model for 2000 API scores (Stata `svy: reg` style). 3. Computing **survey-adjusted** average marginal effects and standard errors via Taylor linearization. 4. A design-based bootstrap cross-check. 5. A policy-relevant contrast: the expected API gap between low- and high-poverty schools. 6. Validating the survey estimate against the *known* population truth. ```{code-cell} python import jax jax.config.update("jax_enable_x64", True) import numpy as np import pandas as pd import statsmodels.api as sm import statsmodels.formula.api as smf from pymargins import Margins, SurveyDesign # Load the stratified sample generated from R's survey package apistrat = pd.read_csv("data/apistrat.csv") print(apistrat.shape) print(apistrat[["stype", "api00", "api99", "meals", "ell", "avg.ed", "mobility", "pw", "fpc"]].head()) ``` ## 1. Explore the survey design ```{code-cell} python print("Schools per stratum:") print(apistrat["stype"].value_counts().sort_index()) print("\nWeight range: {:.1f} – {:.1f}".format(apistrat["pw"].min(), apistrat["pw"].max())) print("FPC range: {:.1f} – {:.1f}".format(apistrat["fpc"].min(), apistrat["fpc"].max())) ``` The weights differ by roughly a factor of three because elementary, middle, and high schools were sampled at different rates within their strata. ## 2. Fit a weighted model We model the 2000 API score as a function of: - `meals` — percent of students eligible for subsidized meals (poverty proxy) - `ell` — percent of English-language learners - `avg.ed` — average parent education (in years) - `mobility` — percent of students who move in/out during the year Because the sample is stratified with unequal probabilities, we fit with `sampling weights` so the coefficients generalize to the population: ```{code-cell} python fit = smf.glm( "api00 ~ meals + ell + Q(\"avg.ed\") + mobility", data=apistrat, freq_weights=apistrat["pw"].values, ).fit() print(fit.summary().tables[1]) ``` Note that, conditional on parent education and poverty, the English-learner share (`ell`) carries almost no independent signal — its coefficient is small and statistically indistinguishable from zero. Poverty (`meals`) is the dominant predictor, so that is the marginal effect we focus on below. ## 3. Declare the survey design and compute AMEs `SurveyDesign` receives the weights, the stratum labels, and the finite-population correction. Here the schools are drawn directly from strata without an intermediate cluster stage, so we omit `psu`. The `fpc` column holds each stratum's population size, which tightens the SE to reflect that we sampled a non-trivial fraction of every stratum: ```{code-cell} python # stype is 'E', 'H', 'M' — encode as integers for SurveyDesign stratum_codes = apistrat["stype"].astype("category").cat.codes survey = SurveyDesign( weights=apistrat["pw"].values, strata=stratum_codes.values, fpc=apistrat["fpc"].values, ) m = Margins(fit, survey_design=survey, weights=apistrat["pw"].values, at="overall") print(m.dydx("meals").summary()) ``` The point estimate is a **population-weighted** AME and the standard error is **survey-adjusted** — it accounts for the stratified sampling design via Taylor linearization. ## 4. Design-based bootstrap cross-check A stratified bootstrap resamples schools *within* each stratum. With enough replicates it should give a standard error close to the linearization one: ```{code-cell} python m_boot = Margins( fit, survey_design=survey, weights=apistrat["pw"].values, at="overall", method="bootstrap", n_boot=500, rng_seed=42, ) print(m_boot.dydx("meals").summary()) ``` ## 5. Side-by-side comparison ```{code-cell} python results = pd.DataFrame({ "approach": ["survey linearization", "survey bootstrap"], "estimate": [ float(m.dydx("meals").estimate), float(m_boot.dydx("meals").estimate), ], "std_error": [ float(m.dydx("meals").std_error), float(m_boot.dydx("meals").std_error), ], }) print(results.round(4)) ``` Both approaches agree: each additional percentage point of students on subsidized meals is associated with roughly **1.8 fewer API points**, with a standard error around **0.4**. ## 6. Policy contrast: low- vs high-poverty schools What is the expected API gap between a low-poverty school (20 % of students on subsidized meals) and a high-poverty school (80 %), holding the other covariates at their (population-weighted) observed values? ```{code-cell} python from pymargins import pairwise scen, w = pairwise("meals", [20, 80]) res = m.contrasts(scenarios=scen, contrasts=w) print(res.summary()) ``` The contrast is a **difference in expected API scores** — the low-poverty prediction minus the high-poverty one — not a slope. It is averaged over the weighted distribution of the *other* covariates, and its standard error is propagated through both the regression and the survey design. A 60-point swing in the poverty share maps to roughly a **110-point** API gap. ## 7. Validate against the population truth Because the full population `apipop` is published, we can check whether the survey estimate actually recovers the population marginal effect. We fit the same model on all 6 194 schools (no weights needed — this *is* the population) and compare: ```{code-cell} python apipop = pd.read_csv("data/apipop.csv") truth_fit = smf.glm( "api00 ~ meals + ell + Q(\"avg.ed\") + mobility", data=apipop.dropna(subset=["api00", "meals", "ell", "avg.ed", "mobility"]), ).fit() m_truth = Margins(truth_fit, at="overall") ame_survey = m.dydx("meals") truth_value = float(m_truth.dydx("meals").estimate) lo, hi = ame_survey.conf_int() print(f"population truth AME(meals): {truth_value:7.3f}") print(f"survey estimate: {float(ame_survey.estimate):7.3f}") print(f"95% survey CI: [{float(lo):.3f}, {float(hi):.3f}]") print(f"truth inside CI: {float(lo) <= truth_value <= float(hi)}") ``` The 200-school survey estimate lands close to the population value, and the design-based confidence interval covers the truth — exactly what a correctly specified survey analysis should deliver. ## Where to next - [](../tutorials/survey_design.md) — the underlying tutorial, which uses this same California API data to walk through weights, strata, and clusters one at a time. - [](../howto/robust_clustered_ses.md) — robust and clustered SEs for non-survey settings.