Travel mode choice — the value of travel time (WTP)¶
When a commuter chooses between modes, they trade money against time. The rate at which they are willing to substitute one for the other — how many dollars an hour of travel time is worth to them — is the value of travel time savings (VTTS), the single most-used number in transport appraisal. It is a willingness to pay: the marginal rate of substitution between a time attribute and the cost (price) attribute in a discrete-choice model.
pymargins exposes this directly. For a model with a continuous time
regressor and a continuous cost regressor, m.wtp(attribute, price)
returns
with the standard error propagated jointly through both slopes — a ratio, so the uncertainty is not just the two marginal SEs bolted together. This demo estimates VTTS on the classic Greene–Hensher travel-mode data and shows why the ratio’s interval rewards a simulation cross-check.
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from pymargins import Margins
# The TravelMode data ships in long form: one row per (traveller, mode),
# 210 travellers × 4 modes (air / train / bus / car).
long = pd.read_csv("data/travelmode.csv")
print(long.head(8).to_string(index=False))
individual mode choice wait vcost travel gcost income size
1 air no 69 59 100 70 35 1
1 train no 34 31 372 71 35 1
1 bus no 35 25 417 70 35 1
1 car yes 0 10 180 30 35 1
2 air no 64 58 68 68 30 2
2 train no 44 31 354 84 30 2
2 bus no 53 25 399 85 30 2
2 car yes 0 11 255 50 30 2
1. A binary choice with attribute differences¶
statsmodels multinomial logit takes traveller-level regressors, but
cost and time are alternative-specific — each mode has its own.
The textbook device that turns alternative-specific attributes into a
chooser-level regression is the difference specification: restrict
to two alternatives and regress the choice on the difference in each
attribute between them. The coefficient on a difference is the
utility weight on that attribute, which is all VTTS needs.
We take the two ground modes, car vs train, and difference their
in-vehicle time (travel, minutes) and out-of-pocket cost (vcost,
dollars). A full appraisal would fit a conditional logit over all four
modes; the two-alternative difference model is the version that fits a
chooser-level GLM, and it is plenty to exercise wtp() on real data:
wide = long.pivot(index="individual", columns="mode",
values=["vcost", "travel"])
wide.columns = [f"{attr}_{mode}" for attr, mode in wide.columns]
chosen = long[long["choice"] == "yes"].set_index("individual")[["mode", "income"]]
df = wide.join(chosen)
# Restrict to travellers who chose car or train, and build the binary outcome.
df = df[df["mode"].isin(["car", "train"])].copy()
df["car"] = (df["mode"] == "car").astype(int)
df["cost_diff"] = df["vcost_car"] - df["vcost_train"] # dollars
df["time_diff"] = df["travel_car"] - df["travel_train"] # minutes
print(f"{len(df)} travellers; car share = {df['car'].mean():.2f}")
print(df[["car", "cost_diff", "time_diff", "income"]].describe().round(1))
122 travellers; car share = 0.48
car cost_diff time_diff income
count 122.0 122.0 122.0 122.0
mean 0.5 -26.8 -28.6 32.3
std 0.5 24.4 146.7 19.9
min 0.0 -87.0 -327.0 4.0
25% 0.0 -48.8 -130.0 15.0
50% 0.0 -23.5 -7.5 30.0
75% 1.0 -12.2 52.5 45.0
max 1.0 26.0 522.0 72.0
2. Fit the choice model¶
fit = smf.glm(
"car ~ cost_diff + time_diff + income",
data=df,
family=sm.families.Binomial(),
).fit()
print(fit.summary().tables[1])
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
Intercept -2.7097 0.617 -4.392 0.000 -3.919 -1.501
cost_diff -0.0419 0.014 -2.913 0.004 -0.070 -0.014
time_diff -0.0117 0.003 -4.353 0.000 -0.017 -0.006
income 0.0379 0.015 2.505 0.012 0.008 0.067
==============================================================================
Both attribute coefficients are negative and significant: a car that costs more, or takes longer, relative to the train is chosen less often — exactly the sign theory demands. Cost and time are individually well identified; the question is what their ratio says, and how sure we are of it.
3. Average marginal effects¶
On the probability scale, each extra dollar of relative car cost and each extra minute of relative car time both lower P(choose car):
m = Margins.linear_scale(fit, vcov="HC3", at="overall")
print(m.dydx("cost_diff").summary())
print(m.dydx("time_diff").summary())
==============================================================
Margins Result (delta, level=0.95)
==============================================================
estimate std err z P>|z| [95% Conf. Int.]
--------------------------------------------------------------
cost_diff -0.0053 0.0020 -2.7081 0.007 -0.0092, -0.0015
==============================================================
n = 122
κ: 0.145
Delta-vs-sim disagreement: 11.463%
==============================================================
Margins Result (delta, level=0.95)
==============================================================
estimate std err z P>|z| [95% Conf. Int.]
--------------------------------------------------------------
time_diff -0.0014 0.0008 -1.7687 0.077 -0.0029, 0.0002
==============================================================
n = 122
κ: 0.277
Delta-vs-sim disagreement: 62.608%
4. Willingness to pay — the value of travel time¶
wtp forms the ratio with joint inference. Because time_diff is a
nuisance attribute (more time lowers utility), the WTP for one more
minute is negative — travellers would need to be compensated to
accept it. The interpretable headline number is its negation: the
value of travel-time savings, in dollars per hour.
wtp_minute = m.wtp("time_diff", "cost_diff")
print(wtp_minute.summary())
vtts_per_hour = -float(wtp_minute.estimate) * 60
print(f"\nValue of travel time savings ≈ ${vtts_per_hour:.2f} per hour")
===================================================================
Margins Result (delta, level=0.95)
===================================================================
estimate std err z P>|z| [95% Conf. Int.]
-------------------------------------------------------------------
WTP(time_diff) -0.2622 0.2184 -1.2002 0.230 -0.6903, 0.1659
===================================================================
n = 122
κ: 0.277
Value of travel time savings ≈ $15.73 per hour
A value in the mid-teens of dollars per hour is squarely in the range transport economists report for this dataset — a sanity check that the difference specification recovers a sensible number, not just a significant coefficient.
5. Why the ratio wants a simulation cross-check¶
The two slopes are each tightly estimated, but their ratio is a
nonlinear function of β, and the denominator (cost_diff) is the
noisier of the two. That curvature is exactly what the κ diagnostic
watches. Re-running the same WTP under simulation shows how much the
delta-method interval understates the asymmetry:
m_sim = Margins.linear_scale(
fit, vcov="HC3", at="overall",
method="simulation", n_sim=4000, rng_seed=0,
)
wtp_sim = m_sim.wtp("time_diff", "cost_diff")
def ci_str(res):
lo, hi = (float(x) for x in res.conf_int())
return f"[{lo:+.3f}, {hi:+.3f}] width {hi - lo:.3f}"
print(f"delta WTP/min = {float(wtp_minute.estimate):+.3f} 95% CI {ci_str(wtp_minute)}")
print(f"simulation WTP/min = {float(wtp_sim.estimate):+.3f} 95% CI {ci_str(wtp_sim)}")
print(f"\nκ on the ratio: {float(np.max(wtp_minute.kappa)):.3f}")
delta WTP/min = -0.262 95% CI [-0.690, +0.166] width 0.856
simulation WTP/min = -0.262 95% CI [-0.920, -0.008] width 0.912
κ on the ratio: 0.277
The simulation interval is wider and skewed — the right behaviour for a ratio whose denominator is uncertain. When a WTP, elasticity, or any other ratio is the deliverable, report the simulation interval (or let the κ guard trip the fallback automatically); the delta interval is a linearization of something visibly curved.
Note
wtp() builds the ratio from two dydx calls and composes them.
If subgroup κ values straddle the fallback threshold (so one slice
falls back to simulation while another stays on the delta method),
composition refuses to mix inference methods. Pin the method
explicitly — method="simulation" on the session — whenever you
compute WTP across subgroups.
Where to next¶
Multinomial logit — the multinomial logit tutorial, where
wtp()is introduced and applied per alternative.Elasticities and semi-elasticities — the other ratio-of-derivatives estimand, with the same joint-inference treatment.
The κ curvature diagnostic — what κ measures and when it forces the simulation fallback.