For the past three months, I've been riding the subway. A lot. Every line, multiple times a week, at different times of day, recording arrival predictions and actual arrivals. My partner thinks I've lost it. She might be right.
But here's the thing: the MTA's arrival predictions are consistently wrong in interesting ways.
The Problem with Point Estimates
When the subway app tells you "Train arriving in 4 minutes," that's a point estimate. A single number. It implies a precision that doesn't exist.
In reality, that "4 minutes" could mean:
- 2 minutes (train was closer than system thought)
- 4 minutes (rare, but happens)
- 7 minutes (train held at previous station)
- 12 minutes (train skipped your stop)
- Never (train taken out of service)
The MTA API doesn't tell you which scenario you're in. It just gives you that confident "4 minutes" and leaves you to make decisions based on false precision.
Embracing Uncertainty
For Uptown, we're taking a different approach. Instead of pretending we know exactly when the train will arrive, we give you a range based on what actually happens.
Our model tracks:
- Time of day (evening rush has higher variance)
- Day of week (Friday nights are chaos)
- Route and direction (some lines are more reliable)
- Historical delays for this station
- Current service alerts
- Weather (yes, it matters)
Then we give you a confidence interval. "Arriving in 3-7 minutes (80% confidence)" is way more useful than "Arriving in 4 minutes" when the actual uncertainty is that high.
The Data Collection
I've been manually recording predictions vs. actual arrivals because the MTA's historical data has gaps. When a train is taken out of service or skips a stop, that often doesn't make it into the official record.
So yeah, I'm that person on the platform with a spreadsheet open, timing trains. I've collected over 12,000 data points so far. The patterns are fascinating:
- The L train is shockingly reliable (tight confidence intervals)
- The F train is... less so (wide confidence intervals, especially in Brooklyn)
- Overnight service is pure chaos (just show "eventually")
- Trains marked "delayed" can mean anywhere from 2-30 extra minutes depending on the line
Why This Matters
When you're deciding whether to run for a train or grab coffee, you need honest information. "Arriving in 2-8 minutes" tells you to chill. "Arriving in 4-5 minutes" tells you to hustle.
Giving people accurate uncertainty is more respectful than giving them false confidence. It lets them make better decisions.
Plus, watching those confidence intervals get tighter as you get more data? Extremely satisfying.
Next Steps
We're now building the real-time component that adjusts predictions based on live train positions. The goal is to have Uptown ready for beta testing this summer.
If you want to help collect data (or you just enjoy riding the subway for science), email me: riley@uglycomputer.net
— Riley