Good Math/Bad Math

Tuesday, April 18, 2006

The Bad Math of Paranormal Research: PEAR

Reading some stuff on Orac's site, I discovered a website run by a professor at Princeton who runs a lab called the "Princeton Engineering Anomalies Research Center", aka PEAR. They're part of the department of Engineering and Applied Science at Princeton. As a Rutgers alumni, I've always been loyally snooty about Princeton engineering; but I never would have dreamed that garbage like this would fly there.

What PEAR studies is allegedly how human beings can influence random number generators. Their experiments consist of a fairly elaborate rig that generates a random sequence; the subject have no physical access to the generator, but tries to influence the "direction" of the random numbers generated, either up or down. I'm looking at one of their papers: a review of 12 years of experiements with this experimental apparatus.

It's an interesting example of bad math, because it's quite subtle. There are, of course, warning signs even from the beginning that the authors are crackpots. One of the common warning signs that the authors know that they're doing nonsense is when they start the paper with a long list of famous people (usually dead) who allegedly support their ideas. This one starts with a list of eighteen people who they allege supported their ideas, ranging from Francis Bacon to Einstein. They even include an alleged conversation between Einstein and an unnamed "important theoretical physicist" about how paranormal powers were a legitimate area of study in physics.

After a detailed description of their experimental apparatus and testing protocol, they finally get the alleged math.

There's a couple of tricks that they pull right away as they get into the math. They have 91 different test subjects; 522 test series; and close to 2 1/2 million "trials". This is intended to give an appearance of a fair, balanced study; but later in the paper, it becomes clear that the different operators ran different numbers of tests in somewhat different settings. They do not indicate how the trials were selected, how they were distributed among the subjects, or whether these are the only subjects that every did the experiment. This is absolutely unacceptable - in statistical mathematics, ensuring fair samples is critical.

Next, we come to a table presenting a summary of their results; I had to flip back and forth between the page with the table and the previous page at least half a dozen times, making notes, in order to be able to make any sense out of this table. While I can't prove it, I strongly suspect that this obfuscatory presentation of data is quite deliberate: it's a classic bad math maneuver: dazzle the readers with lots of complicated looking numbers, equations, and tables, in order to give them the idea that there's a depth to your analysis that isn't really there. This really is a dreadful way of presenting this information: there is not particularly complex data in this table - it's basically means and standard deviations for a fairly straightforward experiment protocol - but it manages to astonishingly confusing.

On the next page - we get a wonderful statement:
The anomalous correlations also manifest in the fraction of experimental series in which the terminal results confirm the intended directions. For example, 57% of the series display HI-LO score separations in the intended direction (zs = 3.15, ps = 8 * 10^-4). In contrast, the anomaly is not statistically evident in the 52% of individual operators producing databases in the intended directions (z0 = 0.31, p0 = 0.38), a feature having possible structural implications, as discussed below.
Yes, 57% of experiments, performed by 52% of the operators: this is an admission that the sample is biased!

And better "the anomaly is not statistically evident"; another way of saying that it's a statistically insignificant result. Yes, the fact that their results are not statistically significant is a feature "having possible structural implications". Pardon me, I need to go roll on the floor and giggle for a few minutes.

.
.
.


Ok. I'm better now. One last thing, and then I'm goint to take a break from this monstrosity. Here's a copy of their graph of the "cumulative deviation over time" of their trials:



Just take a look at that graph: the distance of the lines from the parabolic curve is the cumulative deviation. "HI" accumulates some deviation early, and then stays roughly parallel to the curve; baseline wanders all the heck over the place, ending up right on the upper branch of the curve; "LO" meanders around the curve, approximating it pretty well.

What does that tell us?

Absolutely nothing of value. The deviation of the baseline is extremely close to the deviation of "LO" - i.e., their deviations aren't significant. In fact, if their "HI" didn't have an early bump, which doesn't seem to persist very well, the curves would have been pretty much staying within the 95% confidence interval. (As a side note, if you take a look at their website, they have a whole bunch of different versions of that graph. Amazing how you can get different results by using different samples of your data, isn't it?)

The information they take from this graph is that the "anomalous correlations" - you know, the anomalous correlations that they admitted weren't statistically significant? - they amount to a shift of 10^-4/bit - or 1 bit in 10,000 deviating from the expected norm.

I'll have more to say about this crud later - because it's a fine example of how to use bad math to obscure bad work - but this is enough for now; I can only plow through so much slop in one session.

7 Comments:

  • If you look at Table 1, it shows that a average difference in mean across the HI, LO and BL groups is about 0.008 (0.026, -0.016 and 0.013 respectively weighted by the number of trials). The mean of the calibration is not statistically different from 0 (mean difference of -0.002 with a std dev of 0.002). The mean of all of the experiments; HI, LO and BL combined, is not statistically signficant from 0 either, with a mean difference of 0.008 and a standard deviation of 0.006 which comes to 1.3 standard deviations.

    Now what are the chances that the researchers assigned whether trials were either HI, LO or BL post ante? They may have simply carved up a distribution of trials that conforms to what would be expected from chance in a way that supports their hypothesis.

    By Anonymous steve, at 1:43 PM  

  • Years ago, Bill Watterson wrote in Calvin and Hobbes that the purpose of bad writing is "to inflate weak ideas, obscure poor reasoning, and inhibit clarity. With a little practice, writing can be an intimidating and impenetrable fog!" The same goes for mathematics, without a doubt.

    By Blogger Blake Stacey, at 4:36 PM  

  • Bad joke I pulled, after CFLarsen on the JREF forums mentioned that they aren't letting anyone see the "Egg" random number generators' raw data, only the data after some unknown process:

    "So, the chickens won't let us see their eggs until after they've been cooked?"

    By Blogger Bronze Dog, at 3:54 PM  

  • This reminds me very much of an example from an article[1] about the hazards when analyzing very large data sets. The thought experiment of one section was to investigate the correlation between height and arrival time of employees at 10,000 companies. Lo and behold, you will find about 500 companies where there is a statistically significant correlation (above 95% confidence level) -- but of course that does not mean there is any real correlation. It just means your statistical tools are properly calibrated.

    [1] Roehle, B. "Channeling the data flood", IEEE Spectrum, pp. 32-38, March 2006.

    By Anonymous Michael Poole, at 2:48 PM  

  • I have lots to say about Mark's comments, but I'll start with 3:

    1) I've met Bob Jahn several times over 25 years. He's dean emeritus of the Princeton Engineering School. I'm not one to put a lot of stock in titles, but that might give you pause before dismissing him as a crackpot. In fact, I found him to be sincere, understated, and careful.

    2) I don't think that the existence of parapsychological effects is ruled out by physical theory. I think it's a legitimate experimental question whether they are real. I welcome people like Jahn who put serious effort into experiments that aim to decide this question. There are plenty of people sitting on the sidelines and scoffing, or laughing up their sleeves.

    3) Mark makes much of the fact that Jahn's sample of human subjects is not random. Agreed - it's not a random sample. That doesn't make the result any less striking or amazing to me. The null hypothesis is that there should be no correlation between what people wish for and the level of quantum noise in a transistor. The fact that Jahn's sample of humans wasn't random doesn't detract from the jaw-dropping observation that he did find some correlation, and that it persisted over many years and millions of trials.

    -Josh Mitteldorf

    By Blogger Pantheist, at 11:19 PM  

  • Josh:

    I don't rule out the concept of parapsychological effects; I just expect that if you want to do a study that purports to explore whether or not they exist and are measurable, that you won't use deceptive math to try to bias the results to come out the way you want them to.

    I don't care who Jahn is or what he's done in the past. Brilliant people aren't exempt from self-deception or outright idiocy. William Shockley was one of the inventors of the transistor, and then dedicated his life to proving the racial superiority of european whites; Linus Pauling won a nobel prize, and then spent the rest of his life doing crackpot stuff on megadoses of vitamin C.

    If the work Jahn is doing *now* is good, then it will stand up to critical analysis. It doesn't.

    The non-random sample is the fundamental explanation of what's wrong with his work. If you've got a couple of million pieces of data, and you can cherrypick samples, you can produce a set of data that shows anything you want. Look at Steve's comment at the top of this comment thread. The experimental data does not support anything anomalous - it's just that the samples are selected in a way that produces the desired result.

    By Blogger MarkCC, at 1:43 PM  

  • If it doesnt stand up to critical analysis why delete my points that i made about the problems in your analysis

    By Anonymous Anonymous, at 11:48 AM  

Post a Comment

Links to this post:

Create a Link

<< Home