It's often hard to come up with examples to teach in class that aren't completely canned, so I'm going to keep a series running of things I've come across that I want to remember and share.
Today's topic: Selection Bias.
I live in St. Louis which has a decent mass transit system. It's composed of both buses and a railway. The rail system is fairly new and doesn't have great coverage, but it's slowly expanded in the past several year and is improving reaching a decent area of the suburbs. But from there, you'll have to use the buses.
For a long time, I haven't felt the bus system was too great. Not because the buses don't run on time or are in disrepair, but because I constantly see them "Off Duty". It seems much like the stereotype of road workers: They spend more time sitting around than actually doing anything.
There's a reason for this that I'll get to in a minute, but first, let's toss in the statistics. Over the past few weeks, while driving, I've kept track of buses that were on duty and off. Out of ~30 buses, only about 5 were on duty. That's only 17% on duty. So what's up with that other 83%? Does St. Louis really waste 83% of their buses to just drive around off duty?
Obviously not. There's something affecting my sample that's skewing my numbers pretty significantly. Namely, it has to do with where I drive. It just so happens that a good deal of my driving is on Brentwood Blvd and the surrounding areas. This is also the street on which the station is for all the buses. The result of this is that, as buses are going on duty to where their routes are, or coming off duty to return at the end of their shift, they'll be frequenting this area. All of the routes go through this area at those times, but only one is actually on duty because they have a route on that street.
Thus, my numbers come out wrong because they were formulated on a lack of understanding of the situation.