Good Math/Bad Math

Sunday, April 23, 2006

A Sunday Snack: Finishing Up the PEAR

Finishing up on other incomplete stuff, I started discussing the PEAR survey paper. As a quick reminder, PEAR is a group at Princeton University that is studying "the interaction of human consciousness with sensitive physical devices, systems, and processes common to contemporary engineering practice". There's not a huge amount to say about the rest of it, but there are a few good points worth taking the time to discuss.

Overall, the problem with PEARs publications as a whole is that they've gathered a huge amount of data, which contains no statistically significant anomalies of any kind. But they keep trying to show how various properties of their data, despite being statistically insignificant, show anomalous features. This is a totally meaningless thing to do: any set of data, if you slice and dice it enough, will contain subsets that have apparently anomalous features; the whole point of statistical significance is that it's a way of measuring whether or not apparent trends in the data are strong enough to be considered as more than just random patterning or noise.

Section 3C is typical. Here's what they say in the introduction to the section:
Any structural details of the trial count distributions that compound to the observed anomalous mean shifts may hold useful implications for modeling such correlations. While no statistically significant departures of the variance, skew, kurtosis, or higher moments from the appropriate chance values appear in the overall data, regular patterns of certain finer scale features can be discerned.
This is an incredibly damning statement. Translated into simple english: there is nothing statistically significant in this data to indicate anything other than randomness. But by taking carefully selected subsets, we can discover interesting patterns.

Let me give an extreme example of what this means, in practice. If I take a set of data containing random numbers, and I slice it so that it contains only multiples of two and three, then I'll find that my data contains a subset which has an anomalously high proportion of even numbers. Is it a meaningful anomaly? Only in the sense that I selected the subset with the property that interested me - I created the anomaly by my data selection - the "anomaly" is a property of my selection process, not of the underlying data.

Section 3D is more of exactly the same thing:
Given the correlation of operator intentions with the anomalous mean shifts, it is reasonable to search the data for operator-specific features that might establish some pattern of individual operator contributions to the overall results. Unfortunately, quantitative statistical assessment of these is complicated by the unavoidably wide disparity among the operator database sizes, and by the small signal-to-noise ratio of the raw data, leaving graphical and analytical representations of the distribution of individual operator effects only marginally enlightening.
This is interesting for a couple of reasons. First - this is the most direct admission thus far in the paper of how thoroughly invalid their data is. They're combining results from signicantly different experimental protocols: some of the testers have done significantly more experiments with the apparatus than others; but the results are intermixed as if it was all consistent data.

Second, they're openly admitting that the data is invalid, and that any conclusions drawn from it are pretty much meaningless because of the problems in the data - but they're going to proceed to draw conclusions anyway, even though in their best assessment, the results can be only "marginally enlightening". (And even that is a ridiculous overstatement. "Meaningless" is the correct term.)

Section four starts with another thoroughly amusing statement:
Possible secondary correlations of effect sizes with a host of technical, psychological, and environmental factors, e.g. the type of random source; the distance of the operator from the machine; operator gender; two or more operators attempting the task together; feedback modes; the rate of bit generation; the number of bits sampled per trial; the number of trials comprising a run or series; the volitional/instructed protocol options; the degree of operator experience; and others have been explored to various extents within the course of these experiments, and in many other related studies not discussed here. Very briefly, qualitative inspection of these data, along with a comprehensive analysis of variance [40], indicates that most of these factors do not consistently alter the character or scale of the combined operator effects from those outlined above, although some may be important in certain individual operator performance patterns.
De-obfuscated, that translates to: "Pretty much every property of the experiment has been varied in all sorts of ways; the results of the experiment have been shown to be statistically insignificant; and then when we tried to piece out single factors as influencing the results, we couldn't find anything statistically significant."

One last quote, and I'll be done with PEAR once and for all. PEAR has been frequently criticized by, among others, James Randi (aka the Amazing Randi) for the bogosity of their experiments. One of the recurring criticisms is, quite appropriately, the replicability of the experiment: that is, can they reproduce these kinds of supposedly anomalous results in a controlled trial set up and observed by someone outside of PEAR?

Here's their response:
From time to time, the experiments reported here have been assessed, both formally and informally, by a number of critical observers, who have generally agreed that the equipment, protocols, and data processing are sound [49]. Frequently, however, the caveat is added that such results must be “replicated” before they can be fully accepted, with the replication criteria variously defined to require strict preservation of all technical and procedural details, or to allow more flexible similarities in equipment and protocols. It is our opinion that for experiments of this sort, involving as they clearly do substantial psychological factors and therefore both individual and collective statistical behaviors, to require that any given operator, on any given day, should produce identical results, or that any given operator group should quantitatively replicate the results of any other, is clearly unreasonable. Rather more apt would be such criteria as might be applied to controlled experiments in human creativity, perception, learning, or athletic achievement, where broad statistical ranges of individual and collective performance must be anticipated, and results therefore interpreted in statistically generic terms.
"We can't reproduce our results, because that would be unfair."

Of course, people have done carefully controlled trials of perception, learning, and athletic achievement, with reproduceable results. (I don't know of any in creativity, but that's because I don't know of any definition of "creativity" that's sufficiently quantifiable to be measured.) But hey, PEAR doesn't let that stand in their way. It's just unreasonable to ask them to be able to reproduce them.

4 Comments:

  • Good for you! I've never been able to get through their gobbly-gook before. And like Sagan, many years ago I thought that maybe, just maybe, there might be something to brains affecting/effecting on a quantum scale. If I understand you correctly, what you're saying is that PEAR is doing the equivalent of flipping a coin thousands of times and then pointing to where heads came up 10 times in a row and saying, "See!! Proof!!" Please correct me if I'm wrong. I was a moron when it came to statistics.

    By Blogger L, at 1:57 PM  

  • blake, that's brilliant! Now how can we make money off this?

    By Blogger L, at 6:41 AM  

  • I wonder if Princeton might be lacking in statisticians?
    I was recently talking to a Princeton Student about an analysis he wanted to do and while I like to think I'm somewhat statistically literate, there's also the point where I start to get out of my depth. There were some issues with his analysis that were approaching that point and I suggested that he should talk to an actual statistician. His response was that Princeton didn't really have any.
    I don't know how accurate that is, but if its true, it's a potential problem, especially for biology graduate students.

    By Blogger Darkling, at 3:54 PM  

  • "neuralgourmet said...
    blake, that's brilliant! Now how can we make money off this?"

    Lots of people are - have you never heard of the investment management industry? How do you think the mutual funds in your 401K got selected?

    By Anonymous Anonymous, at 9:56 PM  

Post a Comment

<< Home