ANOTHER LOOK AT HOW IP AND WONG WENT WRONG
By Mark Blaxill
DeSoto and Hitlan’s paper (see accompanying article above) does a thorough job of analyzing the corrected data, but out of professional courtesy, they are somewhat circumspect in detailing the full scope of the errors committed by the Hong Kong University team. Since the journal editor, Roger Brumback, was gracious enough to publish the raw data he obtained from Wong (assuming, of course, that she got it right this time around), an interested observer can check all of Ip et al’s revised calculations for themselves and compare them line by line with the original.
In the simple table below, I’ve gone back and gone through each of the mercury level findings from Ip et al on a line-by-line basis. In simplest form, I put the findings of the original paper (P Ip, V Wong et al. “Mercury exposure in children with autistic spectrum disorder: case-control study.” Journal of Child Neurology. November, 2004) in column A and both the formal corrections from Wong and the fresh calculations based on those raw data published last November in column B (P Ip, V.Wong et al, Erratum in: Journal of Child Neurology. November, 2007). In order to get a clearer sense of the nature of the errors and any pattern that one might find in them, I simply calculated the difference between ALL of the findings in columns A and B and put those calculated differences in column C. The results are displayed in the table below.
Mercury levels |
A Original paper Mean (standard deviation) |
B Revised data Mean (standard deviation) |
C Difference Mean (standard deviation) |
Blood samples | |||
Autistic blood (nmol/L) |
19.53 |
19.53 |
-- |
(5.65) |
(15.65) |
(10.00)1 | |
Control blood (nmol/L) |
17.68 |
14.68 |
3.001 |
(2.48) |
(12.48) |
(10.00) 1 | |
Two tailed P value (reported) |
0.15 |
0.056 |
63% off 2 |
Two tailed P value (correct calculation) |
0.025 |
NA |
83% off 3 |
One tailed p value (calculated) |
0.013 |
0.028 |
|
Hair samples | |||
Autistic hair (ppm) |
2.26 |
1.98 |
0.282 |
(0.21) |
(1.05) |
(0.84) 2 | |
Control hair (ppm) |
2.07 |
1.92 |
0.152 |
(0.58) |
(1.58) |
(1.00) 1 | |
Two tailed p value (reported) |
0.793 |
0.79 |
-- |
Two tailed p value (correct calculation) |
0.0083 |
NA |
99% off 3 |
Possible error types 1. Typographical error (results of analysis “typed wrongly”)? 2. Data entry error resulting in calculation error? 3. Calculation error? |
When you lay out the changes in this way, the results are intriguing. Virginia Wong’s explanation to Brumback for the error was as follows: “The raw data and the statistical analysis had been correct. Unfortunately, we now found that quite a few results in the table were typed wrongly.” If you scan down column C, you can find some evidence for typographical errors, but the patterns revealed there suggest that the error were not nearly as simple as Wong claimed. Indeed, there appears to be a virtual orgy of errors, including but not limited to typographical errors.
For example:
• The mercury blood levels seem to be consistent with a typographical error. The tens digit seems to be omitted in the original 2004 report of both the autistic cases and the control groups’ standard deviation, making the original standard deviation for the autistics of 5.65 actually 15.65 when corrected and the standard deviation results for controls of 2.48 actually 12.48 after correction. The mean of the mercury levels in blood for the autistic sample doesn’t change, but the mean of the control sample appears to be another typographical error, with the corrected mean of 14.68 apparently mistyped in the original as 17.68.
• The original calculation of statistical significance (the so-called P value) of blood levels of mercury that caught DeSoto’s attention remains in error, but the corrected value is different than DeSoto’s original correction. This occurs because DeSoto made her first observation based on the Ip, Wong et al’s original (and supposedly) typographical errors. The new standard deviations, corrected by Ip and Wong, are far larger than the original, so while the difference between the autistic and control means is larger after correction, the combined corrections of means and standard deviations require an entirely different P value calculation. If one uses the original numbers, DeSoto and Hitlan correctly pointed out that the difference between autistic and control blood mercury levels were significant (anything with a P value below 0.05 is significant). But using Wong’s corrected numbers the difference just barely misses the significance threshold. That said, there is simply no way in which the originally reported P value of 0.15 can be replicated—not from the original reported numbers, not from the corrected numbers—and Wong provides no satisfactory explanation for the original erroneous calculation. The only thing one can be certain of from a close inspection of the error in calculating statistical significance is that it simply cannot be the result of a typographical error (or at least not a simple one).
• De Soto and Hitlan’s paper offers a subtle side observation about the revised, and barely non-significant, P value. She offers skepticism about Wong’s claim that, even though there were errors in the calculation, the conclusions (i.e. that blood mercury levels were not higher in autistics than controls) remain unchanged. DeSoto notes that there are two ways that one can calculate P values and that “when the literature leads a researcher to propose a specific direction of the difference, a one tailed [hypothesis] test is called for”(if you’re unfamiliar with statistics, don’t even bother to ask what all this means). They caution, therefore, that declaring the revised two-tailed P value of 0.056 in blood to be not significant, which Wong attempts to do, is ill-advised. A one tailed test would give a P value of 0.028, which would mean that the mercury levels in the autistic group would be significantly higher than in controls. Both results suggest that the difference in the blood mercury levels is quite close to the statistical significance threshold of 95% confidence.
• An even more convoluted pattern of error emerges in the mercury levels reported in the hair samples. The reported mean levels in hair, unlike the blood levels, changed substantially from the original version to the corrected version. If one recalculates the P values based on the raw data provided by Brumback (an exercise I went through myself), one finds that the corrected results provided by Ip, Wong et al replicate the P value calculation of the original report exactly, both give p = 0.79, but they get there with inputs to the calculations that are completely different: the means for autistic and control hair levels both change and get closer to each other (this narrowing could be explained only by data entry errors in the original data table, not a simple error in typing the table); while the standard deviations change is consistent with a typographical error in one case (a standard deviation of 0.58 in control hair becomes 1.58 after correction, in the autistic hair the correction in standard deviation could only be consistent with a data entry error, not an error in typing the table.
• This series of inconsistent errors in the hair data have the effect of changing the statistical significance of the hair findings. Whereas a correct calculation of the original reported results would have yielded a clearly significant finding, the new data, which magically converge in complex ways to yield the insignificant P value of 0.79 in the original -- suggests no overall difference in hair content between the two groups. (DeSoto and Hitlan go on to perform a clever statistical analysis of the corrected data showing a hair finding that supports the hypothesis of reduced excretion rates in autistics - -an approach that shows they are not only more careful but more diligent in extracting every possible insight from their data -- but that additional analysis doesn’t speak to the errors in the original.)
The initial negative results reported by Ip, Wong et al, as Mary Webster has pointed out previously in AOA (click HERE), have been widely cited by scientists seeking to dismiss any causative role for mercury in autism. The corrected results conveniently avoid the need to restate the original conclusion, i.e. the results remain statistically insignificant. But does the pattern revealed in Ip, Wong et al’s correction pass muster as a good faith correction?
You’ve seen the before and after analysis. What do you think?
Mark Blaxill is Editor at Large for Age of Autism.
Michael,
"Note From Editor-in-Chief About Erratum for Ip et al Article" by RA Brumback:
http://jcn.sagepub.com/content/22/11/1321.extract
For more insight into DeSoto and Hitlan's criticism of Ip et al's Child Neurology article see Desoto's website:
http://www.uni.edu/desoto/desoto_hitlan_autism.html
I don't know why the Pediatrics International article was retracted other than "mistakes in methodology and presentation of study cohorts."
Posted by: Carol | December 07, 2010 at 06:26 PM
I need some help. The retracted article by Ip et al is in Pediatrics International. The article published in The Journal of Child Neurology had erratum published but was not retracted.
I stumbled across the comments made by the editor Brumback about the erratum, but lost them and haven't been able to get them back.
I have some questions
The articles in the 2 journals are not the same. The Pediatrics International article compared blood and hair mercury levels to fish consumption. The Journal of Child Neurology article by the same authors investigated blood/hair mercury levels in kids and their relation to autism.
Does anyone at Age of Autism know the reason for the retraction in Pediatrics International and how that retraction relates to the other article?
Does anyone at Age of Autism have the comments posted by Brumback about the erratum somewhere on the site?
Thanks for keeping this site up
Posted by: Michael Polidori | December 07, 2010 at 02:33 PM
There was quite a bit of fall-out from Ip et al's mistakes:
"When we last left this story in the July 2008 issue of the _Journal of Child Neurology_, we learned that Ip et al had published articles nearly simultaneously in 2004 in both the _Journal of Child Neurology_ and _Pediatrics International_ concerning mercury in children with autistic spectrum disorder. Both articles used the same data set but described the data set in 2 different ways. In 2007, the data set was provided to the _Journal of Child Neurology_ and was published along with a new statistical analysis. Now, the _Journal of Child Neurology_ has received information that the _Pediatrics International_ article has been retracted."
http://jcn.sagepub.com/content/23/12/1497.short
"The following article from Pediatrics International, ‘Environmental mercury exposure in children: South China’s experience’ by Patrick Ip, Virginia Wong, Marco Ho, Joseph Lee, and Wilfred Wong. Pediatrics International 2004; 46: 715–721 (doi: 10.1111/j.1442-200x.2004.01972.x), has been retracted by agreement between the authors, the journal Editor in Chief, Yukishige Yanagawa, and Wiley-Blackwell. All authors wish to retract this paper because of mistakes in the methodology and the presentation of study cohorts.
The retraction has also been agreed in light of its relation to the following papers:
Ip P, Wong V, Ho M, Lee J, Wong W. Mercury Exposure in Children With Autistic Spectrum Disorder: Case-Control Study. J. Child Neurol. 2004; 19: 431–434.
Erratum. J. Child Neurol. 2007; 11: 1324.
Brumback RA. Note From Editor-in-Chief About Erratum for Ip et al. Article. J. Child Neurol. 2007; 11: 1321."
http://onlinelibrary.wiley.com/doi/10.1111/j.1442-200X.2008.02709.x/full
Posted by: Carol | November 21, 2010 at 12:28 PM
Huh?
Exhibit one on why there are no letters behind my name!
I thank God for people like Mark who can figure this science stuff out so I don’t have to ;-)
Whewwww.
Posted by: Kelli Ann Davis | February 03, 2008 at 01:25 PM
"The corrected results conveniently avoid the need to restate the original conclusion, i.e. the results remain statistically insignificant. But does the pattern revealed in Ip, Wong et al’s correction pass muster as a good faith correction?"
It seems that with so many childrens' lives at stake here, that reliance on the efforts of the givers of the data and the analyzers of the data leaves a lot to be desired. It seems to me that there is inadequate rage on our part at the cavalier attitude displayed by the federal authorities into their so-called "investigation" of the horror that has been perpetrated on our children. To be able to quote an erroneous study time and again and not publicly make a hue and cry over reliance on fudged data, *nevermind* what the corrected data looks like today, reeks of something malodorous. If I were an independent observer not knowing anything of the prior history of this debacle, I would seriously wonder at the reliability of any of the studies instigated by and coming out of federal agencies and that have been quoted widely as defence for the continuance of this vaccine program. I would demand an independent agency take a relook at all the prior studies that have been bandied about as showing no association between thimerosal and autism. This, frankly, stinks too much. There's something fishy in Denmark all right.
Posted by: Crime Stopper | February 03, 2008 at 08:27 AM