## Monday, November 23, 2009

### Sesame Street Logic

Back when I taught labs at KU, one of the labs I taught was one in which students would use a computer program to determine the mass of Jupiter by observing the orbital period of the four biggest moons (known together as the Galilean moons). It's not terribly hard to do. Just find the orbital period, apply Newton's revision to Kepler's 3rd Law and do some conversion factors and the answer's right there.

The only trick is to get the periods right. For the outer 3 (Ganeyemede, Callisto, and Europa), this is fairly easy since they have nice long periods and if you sample the period every few hours, you'll easily see the sin curve it lays out. But the innermost moon, Io, only has a period of about 3 days. So if you're only sampling once every 6 hours or so, the data points will look like a mess and it will be nearly impossible to see the curve well enough to fit the function to it.

So without fail, every semester, I'd have a handful of students who would report their values of the mass of Jupiter from each of its four moons as 2 x 1027 kg, 2 x 1027 kg, 2 x 1027 kg, and something like 3 x 1030.

So without fail, every semester, I'd hand their labs back with a big red X through that question. In going over the lab, I'd tell my students they should use what I call "Sesame Street Logic": One of these things is not like the other.

If you do three independent tests, and they all tell you the mass of Jupiter is one thing, and then one tells you something way off, chances are, that one has something seriously wrong with it. Unless the mass of Jupiter just decided to change for that set of measurements of course....

But introductory Astronomy labs aren't the only place this logic applies. All the time in science, if we have a bunch of measurements saying one thing, and then one really weird outlier, then we generally rework it, or, if we can't (or there's enough data otherwise) we toss it.

Of course, there's some cases where we're looking for outliers like that, such as scanning for novae and the like. But in general, there's a good statistical argument to be made against keeping such bad data.

Many people don't realize that just because you have "data", it doesn't mean it's good data. Indeed, in my major, there was an entire class devoted to data acquisition and analysis to determine the quality of it. Good stuff.

Outsiders to the scientific process don't get this. And it seems that a big part of this "scandal" regarding the hacked Emails from climatologists is just this.

The author's chief complaint comes from a quote that's not even out of context. He just doesn't get the comment. Here it is with the author's emphasis:
The data are attached to this e-mail. They go from 1402 to 1995, although we usually stop the series in 1960 because of the recent non-temperature signal that is superimposed on the tree-ring data that we use.
I'd accuse him of quote mining, but the proper context is there. He just ignored the second half of the sentence!

The really important part of it is that "the recent non-temperature signal that is superimposed on the tree-ring data" bit.

In other words, the researcher is admitting there's false signal there and giving reason to reject the data following 1960. At least, from that particular line.

This is why I can't get behind the climate change denialists. I'm not as panicky about the whole thing as some are, but it's things like this that show me than they don't really get the fundamental methods of science and they cherry-pick their data. Sounds just like all the other pseudo-scientists....