By Mark Blaxill
DeSoto and Hitlan’s paper (see accompanying article above) does a thorough job of analyzing the corrected data, but out of professional courtesy, they are somewhat circumspect in detailing the full scope of the errors committed by the Hong Kong University team. Since the journal editor, Roger Brumback, was gracious enough to publish the raw data he obtained from Wong (assuming, of course, that she got it right this time around), an interested observer can check all of Ip et al’s revised calculations for themselves and compare them line by line with the original.
In the simple table below, I’ve gone back and gone through each of the mercury level findings from Ip et al on a linebyline basis. In simplest form, I put the findings of the original paper (P Ip, V Wong et al. “Mercury exposure in children with autistic spectrum disorder: casecontrol study.” Journal of Child Neurology. November, 2004) in column A and both the formal corrections from Wong and the fresh calculations based on those raw data published last November in column B (P Ip, V.Wong et al, Erratum in: Journal of Child Neurology. November, 2007). In order to get a clearer sense of the nature of the errors and any pattern that one might find in them, I simply calculated the difference between ALL of the findings in columns A and B and put those calculated differences in column C. The results are displayed in the table below.
Mercury levels 
A Original paper Mean (standard deviation) 
B Revised data Mean (standard deviation) 
C Difference Mean (standard deviation) 
Blood samples  
Autistic blood (nmol/L) 
19.53 
19.53 
 
(5.65) 
(15.65) 
(10.00)^{1}  
Control blood (nmol/L) 
17.68 
14.68 
3.00^{1} 
(2.48) 
(12.48) 
(10.00)^{ 1}  
Two tailed P value (reported) 
0.15 
0.056 
63% off ^{2} 
Two tailed P value (correct calculation) 
0.025 
NA 
83% off^{ 3} 
One tailed p value (calculated) 
0.013 
0.028 

Hair samples  
Autistic hair (ppm) 
2.26 
1.98 
0.28^{2} 
(0.21) 
(1.05) 
(0.84)^{ 2}  
Control hair (ppm) 
2.07 
1.92 
0.15^{2} 
(0.58) 
(1.58) 
(1.00)^{ 1}  
Two tailed p value (reported) 
0.79^{3} 
0.79 
 
Two tailed p value (correct calculation) 
0.008^{3} 
NA 
99% off ^{3} 
Possible error types 1. Typographical error (results of analysis “typed wrongly”)? 2. Data entry error resulting in calculation error? 3. Calculation error? 
When you lay out the changes in this way, the results are intriguing. Virginia Wong’s explanation to Brumback for the error was as follows: “The raw data and the statistical analysis had been correct. Unfortunately, we now found that quite a few results in the table were typed wrongly.” If you scan down column C, you can find some evidence for typographical errors, but the patterns revealed there suggest that the error were not nearly as simple as Wong claimed. Indeed, there appears to be a virtual orgy of errors, including but not limited to typographical errors.
For example:
• The mercury blood levels seem to be consistent with a typographical error. The tens digit seems to be omitted in the original 2004 report of both the autistic cases and the control groups’ standard deviation, making the original standard deviation for the autistics of 5.65 actually 15.65 when corrected and the standard deviation results for controls of 2.48 actually 12.48 after correction. The mean of the mercury levels in blood for the autistic sample doesn’t change, but the mean of the control sample appears to be another typographical error, with the corrected mean of 14.68 apparently mistyped in the original as 17.68.
• The original calculation of statistical significance (the socalled P value) of blood levels of mercury that caught DeSoto’s attention remains in error, but the corrected value is different than DeSoto’s original correction. This occurs because DeSoto made her first observation based on the Ip, Wong et al’s original (and supposedly) typographical errors. The new standard deviations, corrected by Ip and Wong, are far larger than the original, so while the difference between the autistic and control means is larger after correction, the combined corrections of means and standard deviations require an entirely different P value calculation. If one uses the original numbers, DeSoto and Hitlan correctly pointed out that the difference between autistic and control blood mercury levels were significant (anything with a P value below 0.05 is significant). But using Wong’s corrected numbers the difference just barely misses the significance threshold. That said, there is simply no way in which the originally reported P value of 0.15 can be replicated—not from the original reported numbers, not from the corrected numbers—and Wong provides no satisfactory explanation for the original erroneous calculation. The only thing one can be certain of from a close inspection of the error in calculating statistical significance is that it simply cannot be the result of a typographical error (or at least not a simple one).
• De Soto and Hitlan’s paper offers a subtle side observation about the revised, and barely nonsignificant, P value. She offers skepticism about Wong’s claim that, even though there were errors in the calculation, the conclusions (i.e. that blood mercury levels were not higher in autistics than controls) remain unchanged. DeSoto notes that there are two ways that one can calculate P values and that “when the literature leads a researcher to propose a specific direction of the difference, a one tailed [hypothesis] test is called for”(if you’re unfamiliar with statistics, don’t even bother to ask what all this means). They caution, therefore, that declaring the revised twotailed P value of 0.056 in blood to be not significant, which Wong attempts to do, is illadvised. A one tailed test would give a P value of 0.028, which would mean that the mercury levels in the autistic group would be significantly higher than in controls. Both results suggest that the difference in the blood mercury levels is quite close to the statistical significance threshold of 95% confidence.
• An even more convoluted pattern of error emerges in the mercury levels reported in the hair samples. The reported mean levels in hair, unlike the blood levels, changed substantially from the original version to the corrected version. If one recalculates the P values based on the raw data provided by Brumback (an exercise I went through myself), one finds that the corrected results provided by Ip, Wong et al replicate the P value calculation of the original report exactly, both give p = 0.79, but they get there with inputs to the calculations that are completely different: the means for autistic and control hair levels both change and get closer to each other (this narrowing could be explained only by data entry errors in the original data table, not a simple error in typing the table); while the standard deviations change is consistent with a typographical error in one case (a standard deviation of 0.58 in control hair becomes 1.58 after correction, in the autistic hair the correction in standard deviation could only be consistent with a data entry error, not an error in typing the table.
• This series of inconsistent errors in the hair data have the effect of changing the statistical significance of the hair findings. Whereas a correct calculation of the original reported results would have yielded a clearly significant finding, the new data, which magically converge in complex ways to yield the insignificant P value of 0.79 in the original  suggests no overall difference in hair content between the two groups. (DeSoto and Hitlan go on to perform a clever statistical analysis of the corrected data showing a hair finding that supports the hypothesis of reduced excretion rates in autistics  an approach that shows they are not only more careful but more diligent in extracting every possible insight from their data  but that additional analysis doesn’t speak to the errors in the original.)
The initial negative results reported by Ip, Wong et al, as Mary Webster has pointed out previously in AOA (click HERE), have been widely cited by scientists seeking to dismiss any causative role for mercury in autism. The corrected results conveniently avoid the need to restate the original conclusion, i.e. the results remain statistically insignificant. But does the pattern revealed in Ip, Wong et al’s correction pass muster as a good faith correction?
You’ve seen the before and after analysis. What do you think?
Mark Blaxill is Editor at Large for Age of Autism.
