Home > Uncategorized > MoshTemp 101

MoshTemp 101

The calculation of the global temperature anomaly has fascinated me since I first saw it. I love data and writing code to process data. Over the past few months I’ve been working on learning R and writing my own version of the calculation. That effort is now producing some results which I’ve shared on other blogs and its time to document and publish the code. My chosen method for doing that is by blogging the code. In the background, of course, I’m preparing a more formal release and enhancements, but for those readers interested in borrowing bits and pieces along the way, I’ve decided to supplement that effort with this blog. The posts here are going to require some knowledge of R. So, I’ll answer basic questions as time permits, but mostly this is for people who know R or are working on their own versions.

Step 1: Install R.

Step 2: Create a  directory to work in.

Step 3: set your working directory that directory:

Step  4: Download  the compressed file v2.mean from here

url_GhcnV2mean_Zipped<-”ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/v2/v2.mean.Z”

you can do that manually or you can have R do it for you. After you download the Zip file, you need to uncompress it. Now, R has a ‘package,” uncompress, that can do this for you, but I’ve found it to be unreliable. We’ll fix that later. So for now, download and unzip the file in your working R directory.

if we want R to do the download for us, just use the following function. First give the Zip file its right name, and provide the function with the url.

GhcnV2_Zipped<-”v2.mean.Z”

download.file(url_GhcnV2mean_Zipped,GhcnV2_Zipped)

Now, you should have a copy of v2.mean in your working directory

The first stage of processing that you will have to perform on v2.mean is to read the file in and put it into a format that you can work with in subsequent processing. The v2.mean file is written by a FORTRAN program and to read it in you have to do a formatted read. The appropriate R function is read.fwf()

The code below will do the trick. I’ve written the code as a function and set certain variables that you can play around with. The v2.mean file is updated as new temperature come in, so if you want to stay current you should refresh your download every month or so, or yearly at least.

readV2Mean<-function(infile=GhcnV2_Mean_Path){

v2<-read.fwf(infile,widths=Ghcn_Data_Widths,comment.char=””,

col.names=Ghcn_Data_Names,buffersize=2000,na.strings=”-9999″,colClasses=GhcnColClasses)

return(v2)

}

Now, you’ll note that you need to define the column widths to read the data, and I’ve chosen to define the column names and the classes for each column. Those definitions are below: By defining the column names I am able write more literate code and refer to columns of the data by the column name. I’ve also defined the column classes. It’s not strictly required, but I’ll do so anyways, just to show the feature. If you are unfamiliar with the elements in v2mean, a quick note. The first 11 digits refer to a temperature station identifier. The 11 digits encode a country (digits 1-3) and WMO number (digits4-8) and an IMOD (digits 9-11) A WMO identifier (World meterological organization) often has multiple physical sites in close proximity. These multiple sites all share the same WMO and are differentiated by the IMOD digit

Ghcn_Data_Widths<-c(11,1,4,5,5,5,5,5,5,5,5,5,5,5,5)

Ghcn_Data_Names<-c(“Id”,”Duplicate”,”Year”,”Jan”,”Feb”,”Mar”,”Apr”,”May”,

“Jun”,”Jul”,”Aug”,”Sep”,”Oct”,”Nov”,”Dec”)

GhcnColClasses<-c(“numeric”,”integer”,”integer”,”integer”,

“integer”,”integer”,”integer”,”integer”,

“integer”,”integer”,”integer”,”integer”,

“integer”,”integer”,”integer”)

Now we are ready to read the file.

First, we define the file name:

GhcnV2_Mean_Path<-”v2.mean”

Then we read the data in. Since, I defined the function, to default to the defined filename, I don’t have to specify it.

Data<-readV2Mean()

or

Data<-readV2Mean( GhcnV2_Mean_Path)

And to look at the data I can just type this:

Data[1:5,1:15]

Id Duplicate Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1 10160355000         0 1966  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA 133 110

2 10160355000         0 1967 100 112 129 143 181 193 239 255 225 201 166 107

3 10160355000         0 1968 109 123 130 160 181 207 247 241 223 191 162 128

4 10160355000         0 1969 117 114 134 149 182 194 223 241 220 185 161 108

5 10160355000         0 1970 129 110 122 138 165 211 232 249 233  NA 148 114

This returns the data for row 1 through 5, columns 1 through 15. The raw data is in C times 10. For the entire analysis I leave the data in this form. On display we will adjust it and display in full C.

Next up: How do we handle duplicates? the 2nd column represents duplicate records for the same ID. Those records need to be combined before we proceed with any other processing. That heavy lifting, it turns out can be done in one line of R with no loops. (well, internally it does use loops )

Next post covers that topic and others.

To download the code and follow along, look here. Every day I’ll post more of the steps to finish the final calculation. and when we are all done we will organize the code into a neat package for distributing.

Along the way if you have suggestions for improvements feel free to make them.

Categories: Uncategorized
  1. Joel
    August 17, 2010 at 10:57 PM

    Your text-on-black format make it a pain to copy and paste code into the R editor. Not to mention it is harder on the eyes as well.

    • Joel
      August 17, 2010 at 11:02 PM

      Cut and paste only seems to be a problem in my RSS app. If I c&p from a normal browser, it works fine.

      • Joel
        August 17, 2010 at 11:15 PM

        I see you provide code files to eliminate the need to c&p. I’ll shut up now.🙂

      • Steven Mosher
        August 18, 2010 at 1:13 AM

        Thanks anyway for the comment. Let me know how it works for you.
        Thinks are going to get more complex very quickly as we have to bring in various data sources
        from here on out. I’ll be doing dumps with zip files and defining a directory structure for
        all the data and code. if you follow all the posts it should make sense. I’m basically rebuilding an already existing system
        file by file, tuning as we go along.

    • Steven Mosher
      August 17, 2010 at 11:16 PM

      Thanks Joel,
      I didn’t really think about the template much when I selected it some months ago. I’ve also experienced problems with
      cut and paste from JeffId’s site where line numbers get copied in. I try to cut and paste code directly out of my editor or cut and past code directly from the console and try to make sure that I don’t write anything that hasnt actually been run. I recently had one of those moments on the R help list
      where I typed in code by hand and of course had typos galor. Anyways, I’ll spend some time trying other backgrounds, like NORMAL white.

  1. August 17, 2010 at 10:06 AM

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: