Home > Uncategorized > Errors in Ghcn Inventories

Errors in Ghcn Inventories

UPDATE!  the WMO is set to publish more accurate data on Nov 8, 2010. So the results here are SUBJECT TO CHANGE.  Stay tuned.

In the debate over the accuracy of the global temperature nothing is more evident than errors in the location data for stations in the GHCN inventory. That inventory is the primary source for all the temperature series. One question is “do these mistakes make a difference?” If one believes as I do that the record is largely correct, then it’s obvious that these mistakes cannot make a huge difference. If one believes, as some do, that the record is flawed, then it’s obvious that these mistakes could be part of the problem. Up until know that is where these two sides of the debate stand. Believers convinced that the small mistakes cannot make a difference; and dis-believers holding that these mistakes could in fact contribute to the bias in the record.  Before I get to the question of whether or not these mistakes make a difference, I need to establish the mistakes, show how some of them originate, correct them where I can and then do some simple evaluations of the impact of the mistakes. This is not a simple process. Throughout this process I think we can say two things that are unassailable: 1. the mistakes are real. 2. we simply don’t know if they make a difference. Some believe they cannot (but they haven’t demonstrated that) and some believe they will (but they haven’t demonstrated that). The demonstration of either position requires real work. Up to now no one has done this work.

This matters primarily because to settle the matter of UHI stations must be categorized  as urban or rural. That entails collecing some information about the character of the station, say it’s population or the characteristics of the land surface. So, location matters. Consider Nightlights which Hansen2010 uses to categorize stations into urban and rural. That determination is made by looking up the value of a pixel in an image. If it bright, the site is urban. If its dark (mis-located in the ocean) the site is rural.

In the GHCN metadata the station may be reported at location xyz.xyN yzx.yxE. In reality it can be many miles from this location. That means the nightlights lookup or ANY georeferenced data ( impervious surfaces, gridded population, land cover) may be wrong. One of my readers alerted me to a project to correct the data. That project can be found here. That resource led to other resources including a 2 year long project to correct the data for all weather stations. Its a huge repository. That led to the WMO documents one of the putative sources for GHCN. This source also has errors. Luckily the WMO has asked all member nations to report more accurate data back in 2009. That process has yet to be completed and when it is done we should have data that is reported down to the arc second. Until then we are stuck trying to reconcile various sources.

The first problem to solve is the loss of precision problem. The WMO has reports that are down to the arc minute. It’s clear that when GHCN uses this data and transforms it into decimal degrees that they round and truncate. These truncations, on occasion, will move a station.  I’ve documented that by examining the original WMO documents and the GHCN documents. In other cases it hard to see the exact error in GHCN, but they clearly dont track with WMO. First the WMO coordinates for WMO 60355 and then the GHCN coordinates:

WMO:   60355 SKIKDA 36 53N 06 54E  [36.8833333, 6.9000]

GHCN: 10160355000 SKIKDA 36.93 6.95

GHCN places the station in the ocean. WMO places it on land as seen above.

To start correcting these locations I started working through the various sources. In this post I will start the work by correcting the GHCN inventory using WMO information as the basis. Aware, of course that WMO may have it own issue. The task is complicated by the lack of any GHCN documents showing how they used WMO documents. In the first step I’ve done this. I compared the GHCN inventory with the WMO inventory and looked at those records where GHCN and WMO have the same  station number and station name. That is difficult in itself because of the way GHCN truncates names to fit a data field. It’s also complicated by the issue of re spelling, multiple names for each site and the issue of GHCN Imod flags and WMO station index sub numbers.

Here is what we find. If we start with the 7200 stations in the GHCN inventory and use the WMO identifier to look up the same stations in the WMO official inventory we get roughly 2500 matches. Here are the matching rules I used.

1. the WMO number must be the same

2. The GHCN name must match the WMO name (or alternate names match).

3. The GHCNID must not have any Imod variants. (no multiple stations per WMO)

4. The WMO station must not have any sub index variants. (107 WMO numbers have subindexes)

That’s a bit hard to explain but in short I try to match the stations that are unique in GHCN with those that are unique in the WMO records. Here is what a sample record looks like.WMO positions are translated from degrees and minutes to decimal degrees and the full precision is retained. You can check that against GHCN rounding. As we saw in previous posts slight movements in stations can move them from Bright to dark and from dark to bright pixels.

63401001000     JAN MAYEN 70.93 -8.67              1001    JAN MAYEN 70.93333 -8.666667

63401008000     SVALBARD LUFT 78.25 15.47    1008    SVALBARD AP 78.25000 15.466667

63401025000 TROMO/SKATTO      69.50 19.00    1025   TROMSO/LANGNES 69.68333 18.916667

63401028000 BJORNOYA                 74.52 19.02    1028    BJORNOYA 74.51667 19.016667

63401049000  ALTA LUFTHAVN 69.98 23.37    1049  ALTA LUFTHAVN 69.98333 23.366667

You also see some of the name matching difficulties where the two records have the same WMO and slightly different names. If we collate all differences on lat and lon in matching stations we get the following:

And when we check the worst record we find the following

WMO:  60581  HASSI-MESSAOUD             31.66667      6.15

GHCN:  10160581000 HASSI-MESSOUD 31.7               2.9

GHCN has the station at latitude ha!  longitude 2.9. According to GHCN the station is an airport:

The location in the WMO file

And the difference is roughly 300km.WMO is more correct than GHCN. GHCN is off by 300km

An old picture of the approach

And diagrams of the airfield

Now, why does this matter.  Giss uses GHCN inventories to get Nightlights. Nightlights uses the location information to determine if the pixel is dark (rural) or bright (urban)

NASA thinks this site is dark. They think it is pitch dark. Of course they are looking 300km away from the real site. From the inventory used in H2010.

10160581000 HASSI-MESSOUD   31.70    2.90  398  630R  HOT DESERT    A    0

About these ads
Categories: Uncategorized
    • Steven Mosher
      November 1, 2010 at 12:38 PM | #2


      Dr Steig and I and others have discussed his study over at the Airvent. We are talking about two entirely different
      things. What eric showed ( and I agree 100% with him) is this:

      1. They selected long records from CRU. The chain of custody goes GHCN to CRU. these are temperature records.
      2. The selected the same records from the UCAR source. Again these are temperature records.

      What the showed was that the data form either source gave the same answer. namely the cru data was not suspect,
      by cru data he mean temperature data, The units would be C.
      I am talking about entirely different data

      This post is not about temperature data. This post is about the metadata. Two entirely different things.
      When I talk about degrees I mean degrees on the compass. not the thermometer.
      This is about the changes and improvements that have to be made to the metadata if we want to use georeferenced
      data. This really is not a discussion about global warming. Its happening. we need to do something. This is a discussion about precision and accuracy requirements

  1. November 1, 2010 at 6:10 PM | #3

    Thanks for you reply. Sorry I misunderstood.

    Has anyone taken the subset of cases with correct metadat, artificially jittered their positions, and ran whatever codes depend upon these to determine sensitivity in output?

    • Steven Mosher
      November 1, 2010 at 10:48 PM | #4

      That’s ok.

      I am im the process of correcting the metadata to do just that. In the meantime I did some preliminary looks at it. These were just looks at how counts changed

      1. with about 2000 corrected sites ( using another source for corrections) you find that there
      wasnt any big shift in rural/urban numbers. So In the big picture it looks like it will NOT make any difference. Which is what I expect, but proving it will be nice.

      A. I need to double check that work as I just did it at the console in R (interactively)
      B. I would need to run a temperature series analysis to really prove the point.

      2. I’ve been informed by NGDC that the nightlights that H2010 and I use has been deprecated.
      I noted an error with their file and the PI has directed me to a new file. This file is not used
      by H2010, so I’ll be forced to do a couple of comparisons. Again, I expect no great differences
      but buttoning up all the possible objections is my main approach.

      3. Jitter. Rather than Jitter I would do the following:

      A. account for the positional inccacuracy of nightlights itself. H2010 doesnt do this. the pixel has a POSITIONAL accuracy of 1.5-2km. So I’d put a boundary around the pixel. This means there are two errors. the station location error and the pixel location error ( just found out about the
      pixel location error.. hat tip to the DMSP program manager)
      B. given that 95% of the station location errors are less than 5km ( TBD) I’d define rural as
      no periurban or urabn lights within say 5-10km.

      C. Nightlights is horrible for detecting population density outside the US. I have some unpublished results on that. So nightlights will be one screen, there has to be a real population
      screen as well. For example, In India dark pixels can have about 200 people per sq/km. In the
      US a dark pixel has 1 person per sq km. Electrification is the difference.

  2. PolyisTCOandbanned
    November 3, 2010 at 10:38 AM | #5

    1. The mislocated airport pictured looks very far “west of the Pecos”. Just looking at the open unpopulated extent, find it hard to beleive true UHI could impact the long term record. (Of course microstation impacts could, but then they always can.)

    2. I’m actually not really understanding the presumption of issues with airports that always came through in Steve’s posts. He had one study and picked at it a little, but not in a comprehensive or even-handed manner. Also, get the impression a lot of Wattsians just think “airports oughta be bad” since they have lots of concrete and are in some sense near population centers. But if you’ve ever flown much and looked down, or looked at maps or whatever, should be pretty evident how open airports tend to be, how they are in far outer suburbs/exurbs, how the amount of paving within the airport fence is pretty much constant over long periods of time, etc. REally pretty far from what you think of as an urban center. There are a few downtown airports like Meigs and Lindbergh, but even here, location next to a body of water, probably limits any UHI.

    • steven Mosher
      November 3, 2010 at 11:55 AM | #6

      Hi TCO

      1. I make no comment about whether this site has UHI or not. I would make these points

      A. UHI is driven by physical characteristic of the site.
      B. some of those characteristics are tied to population density.. loosely
      C some of those characteristics are tied to the character of the surrounding rural area
      (oke 2002)
      D. Nightlights do not characterize population density very well outside the economically
      well developed areas.

      2. Airports: I’m not as anti airport as I used to be back in the days when we discussed Parker.
      I would like to redo Parker using actual wind velcity as opposed to what Parker used.

      3. My assumption is the UHI is effect is small. To prove that i’d like to have solid metadata. The hand waving over the fixable errors does not sit well with me. Neither do the asuumptions that every error means we have GIGO impress me.

      4. The early results show the mislocations cause some minor changes. post when I get some more time. The warming will not simply vanish. Cause it really is warming. That doesnt mean we shouldnt fix the data. People should stop having cows about it and just do the work.

      For me now, this debate is boring and I rather just focus on the programming

      • John Slayton
        November 8, 2010 at 5:13 AM | #7

        Mr. Mosher,
        I had a great vacation last year prowling the west and documenting USHCN stations. Occurred to me belatedly that I was driving right past GHCN stations without getting them. So I started to include them just in case Anthony ever makes good on his threat to extend the Surface Stations gallery. I only have a couple so far, but I’ll be glad to share the on-site numbers now and in the future if it would be useful.
        Oregon’s Sexton Summit shows typical creativity in coordinates. GISS has the station about a mile and a half to the north, MMS is much closer, maybe 300 feet to the east, and my Garmand (at 42.600131, -123.365567) is about right on the satellite photo.
        GISS site descriptions can be just as creative as their coordinates. I got a chuckle this afternoon to see that they put El Centro, California, in an area of ‘highland shrub.’ Locals know that El Centro is in the Salton Sink, 39 feet BELOW SEA LEVEL. Ah, well….

  3. November 3, 2010 at 5:05 PM | #8

    Because I am involved in another project which is heavily based upon digital maps, does anypne have a reference to studies done on error rates in such maps or their metadata?


    • steven Mosher
      November 3, 2010 at 10:03 PM | #9

      do you mean.

      1. errors of “station” or sample locations.
      2. positional accuracy of satillite products?

  4. November 3, 2010 at 11:04 PM | #10


    I mean position errors in digital maps of cartographic features, such as established addresses, roads, etc. I know the old USGS topographic maps and their DEM counterparts had “quality levels” in their metadata. I wondered if this system carried over to the new digital products, or if people essentially had to survey these points themselves, case by case.

    Equivalently, has someone studied residuals from attempted registrations of land use and other cultural feature maps with cadastral depictions from counties and such?


    – Jan

  5. PolyisTCOandbanned
    November 4, 2010 at 1:17 AM | #11


    A. I know you weren’t arguing the site is bad. I was just making that point. FWIW. And yes, if a mistake existed with a truly urban site being reclassified as rural that would invalidate the UHI avoidance method.

    B. Yeah, I know your thinking has evolved. I just still have an issue with the initial presumption (the thought pattern, not seemingly objective enough, not looking for killer tests, not comparing plusses and minuses enough or fairly enough). That said, I have always thought you were willing to amend views based PARTICULARLY on code-based demonstrations (which is at least better than the Watts yahoos…and probably better than some of the passive aggressive Steve silence on new learnings). I think you could up your game and significantly shorten the OODA loop still, but…it’s OK man.

    C. Yeah…I know you like the coding. Like to play. Like in that musical where they decide to “put on a show”. It’s all good, I expect. Have fun…at least you learn R!

    • steven Mosher
      November 5, 2010 at 10:02 AM | #12

      I would say I’m probably one of the few people who has evolved in this debate.

      But “killer tests” Hmm. give me an example I love those.

      yes, I get your plusses/minusses point but I think its an easy appeal to ignorance. Still, its one the table.

  6. PolyisTCOandbanned
    November 4, 2010 at 1:24 AM | #13

    The airport thing (the behavior of the skeptics a few years ago, the presumptions) reminds me of the whole Wattsian thing with the surface sites. The “shocking” pictures and the trumpeting of sites that had some window shaker ACs 20 feet from an outside sensor! I rememver asking why the presumption of error, why no testing, etc. I hadn’t “done the work” to analyze it, but based on HVAC training my presumption would have been the opposite of Watts. And in any case, would want to figure it out rather than spend years trumpeting pictures, then hide the data, and mothball the project. Oh…and there was another (better) HVAC engineer who also thought the throw from those units was very unlikely to affect outside sensors and did some simple ASHRAE calcs of order of magnitude.

    • steven Mosher
      November 5, 2010 at 10:06 AM | #14

      I think the HVAC thing and the jet wash stuff is junk. Unfortunately I dont have the thermal analysis codes to prove what we both know. I do have a cool Idea on Dams that I want to kick around however..

  7. u.k.(us)
    November 5, 2010 at 4:53 AM | #15

    I believe there are many airports, with large thermal sinks, that go dark at night due to lack of traffic. If a pilot wants to land, they “wake up” the airport lights upon arrival. Is this a consideration in your study?
    I.E. the large unlighted thermal sink.

    Just wanted to add to your workload :)

  8. steven Mosher
    November 5, 2010 at 9:59 AM | #16

    It’s one of the limitations of nightlights as a proxy for UHI.

  9. PolyisTCOandbanned
    November 5, 2010 at 10:25 PM | #17

    Check out John Graham Cumming for a post on code, that I think you will like. He is more how I think skeptics should be.

    Oh…and I hate threaded replies. Miss replies coming in out of the order. Like your 5th post in between my 4th posts.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

Join 31 other followers

%d bloggers like this: