Tuesday, March 07, 2006

Math Slop: Autism and Mercury

A gem from yesterday, which I though was worth reposting here, and expanding on just a bit. Orac posted a nice dissection of the sloppy science of the Geiers, a couple of autism exploiters purporting to demonstrate a link between autism and vaccines containing thimerasol.

Aside from the more scientific issues which Orac handled quite nicely, the work published by the Geiers is a splendid example of really bad math - and the mathematical problems with what they did are even easier to understand than the scientific problems. (Not that it's hard to understand what's wrong with their science, but their math is even worse!)

To start, let me just repeat what I posted in Orac's comments:

I'm a statistics-trained person, and I can't even begin to describe how disgusted I am at the supposed analysis in this paper.

It doesn't take a lot of work to debunk this on mathematical/statistical grounds. While I think that
Cubbins video is OK, it misses the key point about what's wrong with the analysis. It isn't just that you can massage data to create the kinds of regression lines that you want. This is true, but for someone who isn't mathematically trained, watching that video doesn't provide enough information. His sample set _does_ have a key point where the data direction changes - the trend in the sample data _does_ reverse, and looking at that, on the basis of what he says in his video, you could make the argument that the two line solution is a more accurate representation of the data that the single line - he doesn't make clear *why* splitting the data like that is wrong.

Here's the key, fundamental issue: when you're doing statistical analysis, you don't get to look and the data and choose a split point. What the Geiers did is to look at the data, and find the best point for splitting the dataset to create the result they wanted. There is no justification for choosing that point except that it's the point tat produces the result that they, a priori, decided they wanted to produce.

Time trend analysis is extremely tricky to do - but the most important thing in getting it right is doing it in a way that eliminates the ability of the analysis to be biased in the direction of a particular a priori conclusion. (In general, you do that not to screen out cheaters, but to ensure that whatever correlation you're demonstrating is real, not just an accidental correlation created by the human ability to notice patterns. It's very easy for a human being to see patterns, even where there aren't
any.)

Redo the Geiers analysis using any decent time-trend analysis technique - even a trivial one like doing multiple overlapping three-year regressions (i.e., plot the data from '92 to '95, '93 to '96, '94 to '97, etc) and you'll find that that nice clean break point in the data doesn't really exist - you'll get a series of trend lines with different slopes, without any clear break in slope or correlation.

So - to sum up the problem in one brief sentence: in statistical time analysis, you do not get to pick break points in the time sequence by looking at the data and choosing the break point that is most favorable to your desired conclusion.

In the comments, a thimerasol-autism advocate going by the name of "Fore Sam" proceeded to do more to demonstrate sloppy math. First, he jumped in to defend the Geiers paper - claiming that time-trend correlation that the Geiers so sloppily screwed up was valid. But then comes the real beauty: when it's pointed out that the data in question does not support his conclusion, suddenly he starts to criticize the data source for its statistical invalidity. Now - he's dead on target that the VAERS data, which was used by the Geiers, is a dreadful source. It's a voluntary reporting database with virtually no attempt to verify the truth of the incidents reported on it, much less the causal nature of the relationship between vaccines and the adverse events reported to be associated with them. But what makes this a great example of bad math is that the same guy defends the validity of research based on VAERS data when it supports his belief in the thimerasol-autism linkage, but in the same breath, criticizes the very same data when it doesn't support his belief.

Data is either valid or invalid. You cannot argue that a single data set is valid when used to argue in favor of a conclusion, but invalid when used to argue against that same conclusion. Either the VAERS data is a valid data set to be used to examine correlation between thimerasol and autism, or it is not.

• Congrats on the new blog - I think I will have to visit often, BUT... I am getting old,(50+) the eyes aren't as good as they used to be and it absolutely drives me crazy to try and read white or blue colored fonts on a black background!

Best of luck regardless of color of course.

By  J-Dog, at 9:48 AM

• Excellent post! I always appreciate someone who can expound upon mathematics in non-numeric ways.

The problem with the Geiers' method is that they are picking and choosing, and then fitting the data to their theories, not fitting their theories to the data.

andrea

By  andrea, at 10:02 AM

• BTW, linked to this great article in my own post on "Cognitive Bias, Patterns and Pseudoscience": http://www.xanga.com/qw88nb88

By  andrea, at 12:26 PM

• Data is either valid or invalid.

Gah - Data ARE either valid or invalid.

By  Frumious B., at 2:16 PM

• Great,I have read several of your posts and enjoyed them all. Best of luck

By  Brent, at 10:57 PM