The calculation of the global temperature anomaly has fascinated me since I first saw it. I love data and writing code to process data. Over the past few months I’ve been working on learning R and writing my own version of the calculation. That effort is now producing some results which I’ve shared on other blogs and its time to document and publish the code. My chosen method for doing that is by blogging the code. In the background, of course, I’m preparing a more formal release and enhancements, but for those readers interested in borrowing bits and pieces along the way, I’ve decided to supplement that effort with this blog. The posts here are going to require some knowledge of R. So, I’ll answer basic questions as time permits, but mostly this is for people who know R or are working on their own versions.
Step 1: Install R.
Step 2: Create a directory to work in.
Step 3: set your working directory that directory:
Step 4: Download the compressed file v2.mean from here
you can do that manually or you can have R do it for you. After you download the Zip file, you need to uncompress it. Now, R has a ‘package,” uncompress, that can do this for you, but I’ve found it to be unreliable. We’ll fix that later. So for now, download and unzip the file in your working R directory.
if we want R to do the download for us, just use the following function. First give the Zip file its right name, and provide the function with the url.
Now, you should have a copy of v2.mean in your working directory
The first stage of processing that you will have to perform on v2.mean is to read the file in and put it into a format that you can work with in subsequent processing. The v2.mean file is written by a FORTRAN program and to read it in you have to do a formatted read. The appropriate R function is read.fwf()
The code below will do the trick. I’ve written the code as a function and set certain variables that you can play around with. The v2.mean file is updated as new temperature come in, so if you want to stay current you should refresh your download every month or so, or yearly at least.
Now, you’ll note that you need to define the column widths to read the data, and I’ve chosen to define the column names and the classes for each column. Those definitions are below: By defining the column names I am able write more literate code and refer to columns of the data by the column name. I’ve also defined the column classes. It’s not strictly required, but I’ll do so anyways, just to show the feature. If you are unfamiliar with the elements in v2mean, a quick note. The first 11 digits refer to a temperature station identifier. The 11 digits encode a country (digits 1-3) and WMO number (digits4-8) and an IMOD (digits 9-11) A WMO identifier (World meterological organization) often has multiple physical sites in close proximity. These multiple sites all share the same WMO and are differentiated by the IMOD digit
Now we are ready to read the file.
First, we define the file name:
Then we read the data in. Since, I defined the function, to default to the defined filename, I don’t have to specify it.
And to look at the data I can just type this:
Id Duplicate Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 10160355000 0 1966 NA NA NA NA NA NA NA NA NA NA 133 110
2 10160355000 0 1967 100 112 129 143 181 193 239 255 225 201 166 107
3 10160355000 0 1968 109 123 130 160 181 207 247 241 223 191 162 128
4 10160355000 0 1969 117 114 134 149 182 194 223 241 220 185 161 108
5 10160355000 0 1970 129 110 122 138 165 211 232 249 233 NA 148 114
This returns the data for row 1 through 5, columns 1 through 15. The raw data is in C times 10. For the entire analysis I leave the data in this form. On display we will adjust it and display in full C.
Next up: How do we handle duplicates? the 2nd column represents duplicate records for the same ID. Those records need to be combined before we proceed with any other processing. That heavy lifting, it turns out can be done in one line of R with no loops. (well, internally it does use loops )
Next post covers that topic and others.
To download the code and follow along, look here. Every day I’ll post more of the steps to finish the final calculation. and when we are all done we will organize the code into a neat package for distributing.
Along the way if you have suggestions for improvements feel free to make them.