M&M Project Update, September 2004:
Our Dealings with Nature



In response to our first Environment and Energy paper in October 2003 Mann et al., issued a response on the internet, to which we planned a 3-part reply. We can now update this process, starting with the latter item first.

This document is also available as a PDF File.

<<==Return to Main M&M Project Page

Two Undisclosed And Questionable Methodological Decisions (Part 3)
In January 2004, we submitted a short article to Nature, arguing that the shape of the MBH98 hockey-stick depended on: (1) transforming each tree ring proxy prior to calculating principal components (PCs) by subtracting its 1902-1980 mean rather than subtracting the mean of the length of the PC itself (e,g, 1400-1980 for AD1400 calculation step), as would be done in usual software and (2) the duplicate use of the same Gaspé tree ring series in two locations in the MBH98 data base, and, in one of its duplicate uses, an extrapolation at the beginning to make the series available in AD1400 step calculations, while incorrectly listing the first available date as AD1400, thereby concealing the extrapolation.

Although MBH98 claimed great "robustness" for its reconstructions, even claiming robustness to the exclusion of all tree ring data, we showed that its results were not robust to seemingly slight changes in these aspects of the methodology. We also showed that, once these issues were remedied, our results held up even when we included the NOAMER PCs back to AD1400, which had been a specific methodological difference earlier.

Our cover letter outlined the purpose of our submission, recognizing that it did not fit neatly into Nature’s submission categories, which classify submissions as "Letters", "Articles" and "Communications Arising". "Letters" and "Articles" are reports on "original" work and have longer word limits than "Communications Arising", which may be criticisms. We thought that the submission could be construed as a "letter", although of an unusual type. Our submission was a comment on a publication, but represented a great deal of original work, which obviously no one else had done. Although the content was critical in nature, the topic was of extreme international importance and we stated explicitly that we were open to guidance on editorial format and asked that the submission be valued on its merits.

In early March, we received a favorable revise and resubmit, at which time we were asked to add additional material in order to respond to referee comments (our paper then standing at 1910 words). Referee #1, an expert in principal components, had stated that he:

[found] merit in the arguments of both protagonists, though Mann et al. (MBH) is much more difficult to read than McIntyre & McKitrick (MM). Their explanations are (at least superficially) less clear and they cram too many things onto the same diagram, so I find it harder to judge whether I agree with them". ...[I am] uneasy about applying a standardisation based on a small segment of the series to the whole series, if that is what is being done. [MM note: we confirmed to him that that is exactly what is done in MBH98].

Referee #2 stated:

The technical criticisms raised by McIntyre and McKritrik (MM) concerning the temperature reconstructions by Mann et al (MBH98), and the reply to this criticism by Mann et al is quite difficult to evaluate in a short period of time, since they are aimed at particular technical points of the statistical methods used by Mann et al, or at the use of particular time series of proxy data. A proper evaluation would require to redo most of the calculations presented in both manuscripts, something which is obviously out of reach in two weeks time. Furthermore, both manuscripts seem to contradict each other in some basic facts. Therefore, my comments are based on my impression of the consistency of the results presented, but there is a wide margin of uncertainty that could be resolved only by by looking in detail into the whole data set and the whole software used by the authors. In general terms I found the criticisms raised by McIntyre and McKritik worth of being taken seriously. They have made an in depth analysis of the MBH reconstructions and they have found several technical errors that are only partially addressed in the reply by Mann et al.

We had pointed to the overwhelming weighting given to one hockey stick-shaped North American tree ring series (Sheep Mountain CA) as a result of the subtraction of the 1902-1980 mean. A comment by Mann et al. which we found interesting was that their PC1 did not just depend on Sheep Mountain, but 14 other sites had at least 25% of the contribution of Sheep Mountain.

We re-submitted in late March, adding a new paragraph showing that these 14 highly weighted sites in the PC1 were all from a group of specialized and controversial high-altitude bristlecone pine series, studied by Graybill and Idso (1993), exhibiting an anomalous 20th century growth spurt, which yields hockey-stick shaped growth series. Graybill and Idso stated that explicitly that the 20th century growth could not be explained by local or regional temperature; co-author Hughes in Hughes and Funkhouser (2003) said that the anomalous growth was a "mystery". We also pointed out that Mann et al. had carried out a sensitivity analysis for the effect of the high-altitude bristlecone sites, by calculating a NOAMER PC1 excluding 20 such sites, a calculation evidenced in the NOAMER/BACKTO_1400-CENSORED subdirectory at Professor Mann’s FTP site. The PC1 in this subdirectory proved to be virtually identical to the one we calculated using standard PC methods on the entire North American network. Carrying these PCs forward into an NH temperature index led to a reconstruction almost the same as ours!

Nature then asked us to reduce the paper down to 800 words. This was not easy, but our final version was within the 800-word limit and was submitted on April 9. We did not hear the results of the re-submission for four months. When we inquired about delays, Nature said that they had not heard back from reviewers. It turns out that Nature had added a third reviewer, which may have contributed to the delays.

On August 4, Nature advised us that our submission would not be published. The main reason was that the issues raised are too technical to resolve in the now 500 word space available:

In the light of this detailed advice, we have regretfully decided that publication of this debate in our Brief Communications Arising section is not justified. This is principally because the discussion cannot be condensed into our 500-word/1 figure format (as you probably realise, supplementary information is only for review purposes because Brief Communications Arising are published online) and relies on technicalities that do not bring a clear resolution of the underlying issues.

This decision primarily reflected the views of the new reviewer, who stated:

Generally, I believe that the technical issues addressed in the comment and the reply are quite difficult to understand and not necessarily of interest to the wide readership of the Brief Communications section of Nature. I do not see a way to make this communication much clearer, particularly with the space requirements, as this comment is largely related to technical details.

This reviewer did not object to any of our findings per se. Readers may share our surprise that the matters raised are "too technical" for consideration in a science journal; additionally, whether or not the matters were of interest to a "wide readership" (and we believe that they are), potential defects in MBH98 affect Nature’s publication record and require disclosure.

Our old referees again commented on the difficulty of resolving who was right and who was wrong. Referee #2 (Referee #1 of the first round) remained sympathetic, and stated:

The amount of material, often contradictory, is simply too complex and lengthy to resolve all the rights and wrongs in a realistic length of time" Only a reader with several days to spare (longer if they are unfamiliar with the area), to chase references and probably the authors, could hope to come close to a full understanding of the arguments.

I started my original review by saying that I found merit in the arguments of both MBH & MM. To rewrite this, I believe that some of the criticisms raised by each group of the other's work are valid, but not all. I am particularly unimpressed by the MBH style of 'shouting louder and longer so they must be right'.

However, Referee #3 (Referee #2 of the first round) was impressed by some of the new arguments of Mann et al. (to which we had not had an opportunity to respond). In his new comments, he expressed concern about whether the points could be made within the space limitations and stated:

I see some merit in MM04 and I would encourage them to pursue their testing of MBH98, and by the way other reconstructions. As I wrote in my first evaluation, this should be a normal and sound scientific process that should not hampered. For instance, questions that seem to be quite critical, such as the sensitivity of the MBH98 reconstructions in more remote periods to changes or omissions in the proxy network or the dependency of the final results to the rescaling of the reconstructed PCs, have become clearer to me now. At the moment, my opinion is that the present MM04 manuscript could be of interest just for the bunch of specialist working exactly in the area of statistical methods for climate reconstructions, and this only after several hours of considerable work to understand all technical details properly. Perhaps this is caused by the tight constrained imposed to the Communications Arising category.

Thus, the matter was effectively disposed of on the Procrustean bed of space limitations, rather than the merit or lack of merit of our arguments.

In their 2nd response, Mann et al. argued that they could still get MBH98-like results by increasing the number of retained PCs in the AD1400 step of the North American network from 2 to 5, and they argued that they were justified in doing so under the PC retention policies of MBH98. In this reconstruction, the bristlecone pine hockey-stick shows up as the PC4 (accounting for less than 8 percent of the North American network explained variance rather than 38 percent in the incorrect MBH98 calculation) and still imparts a hockey-stick shape to the whole NH temperature reconstruction. Without the NOAMER PC#4 their NH temperature index reverts to our shape.

They also re-iterated what seems to have been their primary argument: that they get highly significant Reduction of Error (RE) statistics from their reconstruction, while reconstructions using conventional PC standardization ("our" reconstruction) do not. RE statistics do not have a theoretical distribution and MBH98 benchmarked significance by Monte Carlo methods. We have done new simulations, applying the MBH98 PC methodology to trendless red noise modeled to exhibit the persistence of the North American tree ring network. Despite having no trend in the underlying proxies the MBH98 method regularly produces hockey stick-shaped PC1’s which then fit neatly against the temperature data, despite having, in principle, zero explanatory power. The benchmark for RE significance is therefore much higher than reported in MBH98 and their reported RE statistic can be shown to lack significance under a more realistic test.

In our 2nd submission, we had pointed out that we obtained very low R-squared statistics in our emulations of MBH98 and had been unable to replicate their claimed RE statistics. Again, the response of Mann et al. was highly instructive. They wrote an extraordinary diatribe against our supposed advocacy of the "discredited" and "flawed" R-squared statistic, citing Wilks (1995) as supposed authority for this diatribe. Most readers will be surprised to learn that this workhorse statistic has been "discredited". Wilks (1995) discusses cases where the RE statistic is lower than the R-squared statistic, but does not stand as authority for cases (such as MBH98) where there is a high RE statistic and negligible R-squared statistic. (Our new simulations will neatly illustrate what is going here.)

Our discussion of the statistics in MBH98 was considerably hampered by their refusal to disclose supporting calculations for their AD1400 reconstruction step, which takes us back to the progress of Part 2 (see below).

We are obviously disappointed in Nature’s decision on our submission. We have seen nothing in the referee comments or the response materials from Mann et al. to indicate that our arguments lack merit. However, the process has been helpful in several ways. The two stages of the correspondence have enabled a much more precise dissection of MBH98. We plan to submit a revised article elsewhere.

Lacunae and additional inaccuracies in MBH98 descriptions of data and methodology (Part 2)
In their response to MM03, Mann et al. criticized us for not using 159 series, a figure nowhere used in MBH98, and argued that the RE statistics of their reconstructions were higher than the RE statistics in our calculations.

We emailed Professor Mann and asked him to identify the 159 series and to provide the individual results of his 11 calculation steps (described as "experiments") on which his claimed RE statistics rested, which had never been previously archived.

After Professor Mann refused, we submitted a Materials Complaint to Nature in November 2003, referring to Nature’s policies requiring authors to make their data and methods available. We stated:


The policies of Nature rightly place a burden on authors to disclose data and methods to any interested readers. We have been systematically and deliberately stymied by Professor Mann on the most elementary requests: a proper listing of his data series and the exact computational procedures used. In the process of trying to obtain this information we have concluded that the disclosure at the Nature SI site is not merely inadequate, but in some cases it contradicts what is now revealed at the University of Virginia FTP site.
Under the circumstances, we believe that the full data set and accompanying programs for MBH98 should now be included in the Nature Supplementary Information, along with an accounting of any discrepancies between what has been listed at Nature.com to date and what was actually used in MBH98.

Nature replied in early December that:

...we have already been in touch with Professor Mann's group, who have indicated their willingness to supply us with the various materials pertaining to your complaint. Once we have these in hand, we intend to seek external independent advice on the issues that you raise; and on the basis of such advice, we will decide on any actions that need to be taken.

On December 17, we re-iterated and particularized our requests, asking additionally for the disclosure of all residual series, together with programs used in the derivation of residuals and confidence intervals in MBH98. These residual series were used in the calculation of RE and R-squared statistics. We provided Nature with a re-stated list of 10 specific problems which we had identified with the disclosure of data and methods in MBH98, some of which we had already discussed in MM03 and others which we had determined through our examination of Mann’s FTP site. Nature replied that:

We are putting the points that you raise here to Professor Mann (as we did with those from your original communication) and will await his response. I hope that you will understand that, given both the seriousness of your concerns and the time of the year (our office being closed for several days over the coming fortnight), it may take us longer than normal to bring this matter to a conclusion. But we are nevertheless anxious to do so, and I hope that you will bear with us.

Based on Nature'’s statement that Professor Mann had undertaken to provide Nature with the materials relating to our complaint, we made the decision to forego publication of a detailed analysis of these errors in favor of working through Nature to obtain an accurate listing of data and methods.

The ensuing investigation by Nature led to the Corrigendum of July 1, 2004. Under Nature’s policies, a Corrigendum is defined as "notification of an important error made by the authors [Nature’s bold] that affects the publication record or the scientific integrity of the paper, or the reputation of the authors or the journal. "

On March 16, we were shown a draft copy of the proposed Corrigendum. We immediately noticed that the Corrigendum was very incomplete and, despite its brevity, contained a number of errors, some of which were readily avoidable, and many inaccuracies. We requested the opportunity to see the new SI, but were not permitted to see it. Nature advised us that they did not edit supplementary information and that it was the sole responsibility of the contributing author, a point which may not be generally known, especially since in their policies they claim that SI is peer-reviewed.

Our biggest concern was the incompleteness of the Corrigendum. It failed to mention an extremely important inaccuracy in MBH98 description of methods (the significance of which we had discussed at length in our submission): their failure to describe the subtraction of the 1902-1980 mean prior to tree ring principal components calculation. In fact, the Corrigendum did not even mention the use of "stepwise" principal components calculation of tree ring network principal components, a matter which was not mentioned in MBH98 and which later formed much of the content of the new SI. The Corrigendum acknowledged the inaccurate citation of instrumental series in MBH98, but the new citation merely to "NOAA" was totally unhelpful as NOAA contains thousands of series. The Corrigendum also did not acknowledge the geographical errors in the precipitation series.

The Corrigendum did comprehensively acknowledge the discrepancies between the listings of series in the original SI and the series actually used. However, the Corrigendum purported to excuse or explain the discrepancies by stating that the differences resulted from the application of additional objective quality control to series already shortlisted according to quality control tests described in Mann et al. (2000).

We had previously examined these quality control tests and found that many of the shortlisted series failed one or more of the supposed quality control tests. One series, in fact, failed the tests so spectacularly that we contacted the originating author, who discovered that the wrong tree ring chronology had been archived all this time and promptly requested that it be removed from the World Data Center for Paleoclimatology. The well-known Polar Urals series, used in nearly all multi-proxy studies, including MBH98 and MBH99, fails the median test for segment length. We provided Nature with a detailed synopsis of the incompleteness and errors in the draft Corrigendum on March 17 . With a little further work, we were subsequently able to identify many other inconsistencies in the supposed explanation. For example, the supposed explanation does not explain why two density series originating from Schweingruber (ak006x, cana096x) are excluded, while other series with seemingly identical methods from the identical publication are included. Similar inconsistencies abound with other authors.

Nature responded to our concerns about the defects in the draft Corrigendum in a very unsatisfactory way.

First, they stated that our comments about the inaccuracy of the purported explanation for the difference between the listing of series in the original SI and the series as actually used "are not directly relevant to the materials complaint. Instead you question the consistency of the methods used, which is not the subject of a Corrigendum." With respect, it seems obvious to us that publishing an incorrect explanation of the discrepancy can hardly be excused merely on the grounds that we had not anticipated it and objected to it in our original Materials Complaint. And regardless, we had actually raised the matter of the reasons for the discrepancy in our Materials Complaint, even drawing Nature’s attention to a different the explanation for dropping ARGE030, one of the excluded series, contained in a note archived in Mann’s FTP site, which said that the deletion would be "better for our purposes" and made no mention of the quality control rules.

Secondly, Nature stated that Corrigenda had to be as "concise" as possible and that "space limitations" prevented a listing of all the matters raised in our comments. They argued that the version they would publish, "together with the Supplementary Information explicitly listing the data sets and methods used" would "clearly establish which data were used in the paper." Interestingly, the reply regarding "space limitations" on the Corrigendum came on exactly the same day that we had been asked to shorten our submission to 800 words and around the same time that the third reviewer was added for our submission.

By the criterion of "space limitations", some of the editorial choices in the Corrigendum are bewildering, with scarce space used for bizarre trivialities, while critical items are referred to deep in the fine print of the Supplementary Information or not at all. For example, scarce space is used to supposedly correct an inaccurate citation of the source for the Briffa et al. Western U.S. temperature reconstruction; amusingly, the correction is itself incorrect, citing a publication about Scandinavian temperature reconstruction. But space is wasted on this triviality, while the critical point about the incorrect calculation of tree ring principal components is not mentioned.

In addition, we had previously expressed concern to them that Mann et al. would use the occasion of the Corrigendum to comment on outstanding issues between us and requested the opportunity to comment on the Corrigendum. In response, Nature had assured us:

...the Correction will contain no mention of the controversy between yourselves and Mann et al; it will be a plain correction stating the errors in the original Supplementary Information, and their correction.

Nature asked us to maintain confidentiality about the Corrigendum, which we did. The Corrigendum came out while the Communication was under 2nd review and contained the statement that:

None of these errors affect our previously published results.

This was not in the draft we had been shown, and was obviously a highly controversial statement (in breach of Nature’s undertaking). It was completely inconsistent with the contents of our submission, then under 2nd review at Nature, on the effect of the erroneous PC calculations (which, in addition to being erroneous, had been inaccurately disclosed) and on the effect of the unique extrapolation of the duplicate version of the Gaspé series (which had also been inaccurately disclosed). We objected in writing to its publication. Nature’s response was as follows:

Regarding your disagreement with the last sentence of the Corrigendum by Mann et al., I have consulted with my colleagues, who have now given the matter careful consideration. However, the errors Mann et al. refer to in the last sentence of their Corrigendum are errors in the listing of the proxy data sets in the original Supplementary Information, rather than errors in either the data sets used or the computational procedures. Errors in the listing of data sets obviously do not affect the calculations or results, and we therefore feel that the sentence is appropriate and justified.

We are astounded at this hair-splitting. In March (and previously), we had pointed out important omissions in the Corrigendum some of which are admitted (without acknowledgement) in the new SI. The failure to include these inaccuracies in the printed Corrigendum, while coopering up the record in the new on-line SI, leaves a misleading impression that crucial calculation errors were really just trivial labeling problems.

Even within the artifice of the Corrigendum, the claim is refutable on the Gaspé series, which we discussed in detail in our submission. The Corrigendum (#6) described the extrapolation of the Gaspé series but did not explicitly acknowledge that the start year had been stated in MBH98 as AD1400 rather than AD1404, or that this was a duplicate use of this series. As we noted above, this extrapolation was unique within the MBH98 corpus. The misrepresentation of the start date (regardless of whether this was intentional or not) resulted in avoiding the disclosure of this unique extrapolation. The extrapolation (not to mention the duplicate usage) permitted a series with questionable 15th century quality to be added to the AD1400 roster, where it has a major effect on MBH98 results in the early 15th century. It does very little good to disclose the unique extrapolation 6 years after the fact, when so many positions are cast in stone. Had the extrapolation and duplication been properly disclosed at the time, an alert reviewer or reader might have inquired about the reason for this unique treatment, which would have necessarily led to discovering the lack of robustness in MBH98. Likewise with the inaccurate description of the PC methodology where disclosure of the subtraction of the 1902-1980 mean would surely have caught someone’s attention before we identified the problem.

Despite Nature’s promises that the new SI would be complete, it failed to contain the results of the 11 calibration "experiments" (which are necessary for assessing the goodness-of-fit claims) and it failed to list the 159 series, although it contained listings of series actually used, which sum to 139. Instead of simply providing actual source code, which would have permitted all the outstanding deficiencies in disclosure to be remedied, it provided a verbal description, which is imprecise and in places inaccurate. Given the incompleteness and inaccuracy of the original SI, in our view, confidence in the results can only be assured by the availability of the actual computational code.

We wrote to Nature in early August and asked for these items to be added to the SI. In late August, Nature replied that they had already done as much as could be expected of them and that there was enough information already available.

As to the matter of the 159 series so loudly brought into controversy last fall, Nature stated:

we feel [this] is an issue quite separate from the material that we have published and over which we are in a position to demand a response. Professor Mann has given us the clear understanding that the corrected Supplementary Information now lists *all* of the series used in the paper, and this list is consistent with statements in the original publication (MBH98). The fact that he has separately emphasised to you the need for a number of series greater than those listed in the Supplementary Information is, we feel, something that you should continue to pursue directly with him (along with your other requests for clarification).

In other words, we take it that the number of 159 series is simply fictitious.

As to source code, their position was:

we do not take the view that these are something that in general should automatically be provided on request - the decision of whether or not to do so normally rests with the authors of such codes. What we do consider to be a reasonable requirement is that the authors provide a detailed description of the procedures used, and this is indeed what Professor Mann has supplied in the corrected Supplementary Information (at our instigation, following your original communication with us).

Given that Nature did not review the Supplementary Information, they obviously are in no position to know whether the corrected Supplementary Information is actually an accurate description of MBH98 methods. Since the previous SI was found to be so inaccurate, producing the source code would be appropriate in this case to verify the proffered correction.

As to the results of the "experiments", they stated:

And with regard to the additional experimental results that you request, our view is that this too goes beyond an obligation on the part of the authors, given that the full listing of the source data and documentation of the procedures used to generate the final findings are provided in the corrected Supplementary Information. (This is the most that we would normally require of any author.)

Reluctance on the part of Mann et al. and Nature to produce the results for their "experiments, " and in particular for the AD1400 step, would be one thing if the source code that generated them were available; but the refusal to provide either one is completely unjustifiable, especially since Nature based its decision against our paper, in part, on claims about the RE statistics that can only be verified by looking at the "experiment" results. We surmise, based on our implementation of the methodology, that the R-squared and Coefficient of Efficiency (as this is defined in paleoclimate studies) statistics fail to reach statistical significance for the AD1400 step. It may also show that there are other problems in MBH98 besides the ones that we have described already. We already know that the adverse results from the bristlecone pine sensitivity study were not disclosed.

In sum, we see no reason why there should be any academic or (especially) policy reliance on this article while requests for supporting calculations and source code are obdurately refused.

Future Plans
While we are frustrated that the time invested in the Nature process did not result in their willingness to correct the publication record therein, it did at least allow us to clarify several methodological issues, especially the crucial role of the controversial bristlecone pine series. We will submit a revised article to a peer reviewed publication. We have also submitted an abstract for a planned presentation at the forthcoming AGU meeting in December.

We plan to follow the advice of referee #3 and continue our testing of MBH98 and related papers. We plan to continue attempting to obtain the results of the "experiments" in MBH98 and would welcome any help in this (or in obtaining source code) from readers, who might independently contact Nature, the U.S. National Science Foundation or Professor Mann and his co-authors for this information.



<==GO BACK to main page