The Graph of Temperature vs. Number of Stations

This page explains the origin of a graph comparing the number of weather stations around the world with the simple mean of the temperature data. I have shown this graph in some of my own publications and it has been reproduced in a new book by Marlo Lewis. As I have been asked about its origins a number of times I thought it would be simplest to post a web page about it.

This is the graph:

You can get the Excel spreadsheet that created it below.

  • I first saw the data behind the graph at the Intellicast web site in an essay by Dr Dewpoint, aka Joe D'Aleo, their former Chief Meteorologist. The essay is also available in pdf format here.
  • I wrote to Joe, asking him for the data, which he sent to me. He explained that he got it from obtained it from, using the search criterion 'All stations'.
  • The data Joe obtained were put into 3 categories, urban, suburban and rural. In the spreadsheet I used to generate the graph I construct the average temperature as weighted by the number of stations in each group.
  • A very similar station count graph is posted at the GISS website. It is taken, in turn from a 1997 paper by Peterson and Vose. However that station count is smaller by about two-thirds. The GISS data (or GHCN) removes duplicates, records with insufficient continuity, etc. The resulting station count graph looks almost the same though, with the rapid fall around 1990.
  • The graph clearly shows that there is a step up in the mean coincident with the sudden loss of over half the sampling sites around 1990.

  • The temperature average in the above graph is unprocessed. Graphs of the 'Global Temperature' from places like GISS and CRU reflect attempts to correct for, among other things, the loss of stations within grid cells, so they don't show the same jump at 1990.
  • The graph mainly serves to illustrate one of the challenges for people who are trying to use land-based station data to construct a continuous index of the global average temperature over the 1990 boundary. Gridded data reflect processing to (hopefully) remove the influence of problems such as the loss of station counts within grids. The point of the graph above is that a change in the raw mean occurred coincidental with the big loss of stations in the early 1990s. This creates a problem of confounding. After the early 1990s the gridded series started behaving differently, i.e. going upwards so that the 1990s becomes the warmest decade, etc. Maybe the anomaly series are fully corrected for the problem of station closure and the shift in the 1990s was climatic. Or maybe the anomaly series are not fully corrected for the problem of station closure, implying not all the shift in the 1990s data was climatic. To accept the claims that the post-1990 anomaly index is continuous with the pre-1990 data, and only reflects a climatic change, requires the assumption, as a maintained hypothesis, that any effects of the sudden sample change around 1990 have been removed. It has puzzled me why this assumption is not more rigorously tested by people whose research depends on the optimistic interpretation of the gridded data.
  • The loss in stations was not uniform around the world. Most stations were lost in the former Soviet Union, China, Africa and South America. To see this visually, go to the University of Delaware global temperature archive. Click Available Climate Data; log in; under Global Climate Data select Time Series 1950 to 1999; then select Station Locations (MPEG file for downloading). Then sit and watch the movie. The remarkable things are, first, how bad the spatial coverage is outside the US and Europe, and second, what happens at 1990.
  • As early as 1991, there was evidence that station closure beginning in the 1970s had added a permanent upward bias to the global average temperature. Willmott, Robeson and Feddema ("Influence of Spatially Variable Instrument Networks on Climatic Averages, Geophysical Research Letters vol 18 No. 12, pp2249-2251, Dec 1991) calculated a +0.2C bias in the global average due to pre-1990 station closures.
  • Researchers doing trend regressions on globally-averaged temperature data should consider including an intercept/slope break point at around 1990.
  • Pat Michaels and I published a paper that tests whether homogeneity corrections in gridded data are adequate to remove non-climatic influences. We find they are not, and that the nonclimatic effects add up to a net warm bias for the world as a whole.
  • I have not found any discussion of the sudden loss in stations around 1990 in the recent IPCC report. In the TAR there was a brief mention in the Technical Summary, to the effect that if this rate of station closure keeps up it will make it difficult to continue detecting global warming. In other words, the underlying assumption that the increase in average temperature is due to global climate change is not itself subject to question; the problem created by station closure is only that it makes it hard to measure the phenomenon they know must be there.
  • Weather satellites provide complete spatial coverage from 1979 to the present. After seeing the Delaware video, an interesting question to ask would be: in the regions where the most surface data were lost (i.e. Russia) were the temperature trends measured by satellites above or below the global average. That might give some indication of whether the regions that are still well-sampled tend to have higher-than-average warming trends. This is not a study I plan to do, but hopefully someone will.
  • I am using terms like 'global temperature' and 'average temperature' for shorthand. They are intrinsically very problematic!

Go up to Publications and Papers Page
Return to Ross McKitrick's home page