Home > Uncategorized > MoshTemp105


In this post we will finally get to a place where we can draw some graphs of temperatures.

In our first code drop we downloaded the v2mean file from NOAA and read it into a dataframe.  The next steps were developing a function to average the duplicates in that data, and then select time windows of that data and finally apply a test to the stations to see if they fit a criteria for further processing. That criteria states that a station must have a certain number of years reporting in the 1961-1990 period. It allows us to construct a series that goes from the beginning of our data in 1900 to the end.

The next step is creating an anomaly series for each and every station. Mathematically we are going to normalize every January in the entire series with the average january from the base period. For example, if the january temperature is 20C in 1910, and the average of ALL januaries in 1961-90 is 21C, then the January in 1910 is normalized to normal = 20-21 or -1C. Since we are dealing in 1/10’s of C the anomaly would be 10. This process is carried out for every month, for every year, for every station. Sounds like a massive for loop. Luckily Gabor Grothendieck on the R help list was able to provide me with a loopless method for doing that. In terms of development I would first implement all the code using loops, generating results that were then compared to CRU and others. Once I had working code that used loops I would try to reduce using R functions. The major problem with the temperature data is missing years. The source data does not provide NAs if all the months are missing. This means unfolding the V2mean data structure into a continuous vector requires some thought. You can’t just unroll the matrix into a vector. The time series is irregular, you have missing years that need to be filled with NA, preferably with elegance. Researching Zoo I saw that it had this ability, so I played with it for a while and then posted a problem on the R help list. I posted a mini version of the problem and Gabor gave me two solutions, one that used a linear model to solve the problem, the other used some features of zoo. Both were very cool. The solution using lm() isn’t quite as clear on first glance, but it works just fine. I selected the other solution he offered. On to the code.

BTW,Gabor pointed out that R has a defined vector of month abbreviations so, I’ve made this change.


Slick. In addition, you’ll have to install the zoo package. On my MAC I just do this with the package manager. The manager interrogates the Cran server gives me a list of packages and I install that way. If you don’t have this ability and want to just install using R  install.packages(“zoo”)

install.packages(pkgs, lib, repos = getOption("repos"),
                 contriburl = contrib.url(repos, type),
                 method, available = NULL, destdir = NULL,
                 dependencies = NA, type = getOption("pkgType"),
                 configure.args = getOption("configure.args"),
                 configure.vars = getOption("configure.vars"),
                 clean = FALSE, Ncpus = getOption("Ncpus"), ...)

I have yet to figure out all the in’s and outs of managing my packages, so I need to do more work here, but for now, just install zoo, either using your package manager or by installing using the R command. install.packages(). All the code from here will depend upon packages being installed. After you install the package, you have to load the library. library(“zoo”) does the trick.

library(package, help, pos = 2, lib.loc = NULL,
        character.only = FALSE, logical.return = FALSE,
        warn.conflicts = TRUE,
        keep.source = getOption("keep.source.pkgs"),
        verbose = getOption("verbose"))

From here on out the code will load the libraries for you, but if you haven’t installed them, we no JOY for you. Down the road, we’ll do some checking to see if libraries are loaded. With “zoo” installed and loaded we get all sorts of useful things. Data types for time series and other tidbits. Ideally from here on out all my objects would be zoo series. Zoo is an extension of the time series class of R and the temperatures are time series, so good design says get your data in the right kind of object. The nice thing about zoo, however, is that I can still operate on the data as if it were a matrix or dataframe, while preserving the underlying time structure. And, in the end, I can just throw a zoo series at the plot command and zoo plot has methods for zoo. That’s the beauty of it in my mind. I can also “save” a zoo object and we will make use of that as well. In the end, we will also use ‘raster’ objects from the raster package. Ideally there I would define a class of raster objects to contain a zoo series and we would apply spatial/temporal methods on that 3D structure. Enough musing, Onto my adaptation of Gabor’s work: Three functions. The first two take V2mean and turn it into a regular zoo series. This solves the problem of missing years by inserting NAs into the lacuna. If a station has data for years 1923 and 1925, then we want to insert NAs for all the months in 1924. Thats a grunt problem as you are basically just reading through an indexed list and mapping it to another vector. With zoo that is all done for you. The first bit:






This bit of code takes in a vector of data from v2mean. The first two columns ( Id and year) are dumped and the remaining data (the temps) are turned into a zoo series. The time parameter is constructed from the year variable. The type is yearmon, which is zoo speak for a sequence of monthly data. using the first year as our start, we create a sequence of months as long as our data needs. The series is then output to the caller as a regular zoo series in matrix form. The calling function is shown below:

as.Zoo<-function(v2mean)  {

g <- do.call(cbind, by(v2mean, v2mean$Id, zooSeries))




I’ve named the function as.Zoo() because that’s what it does. It takes in v2mean and returns the result as a zoo series. ‘as.Zoo’ uses cbind() and do.call. the process works by taking v2mean and doing a column bind BASED on the  $Year value applying the zooSeries() function. Finally, everything is caste as a regular zoo series. In the end, we have a function that takes in V2Mean, regularizes the gaps, and outputs a zoo series object. Next, comes the problem of normalizing this multidimensional zoo object by subtracting the anomaly period. That code is below. Again, a hat tip to Gabor for this piece:



monthly.mean <- aggregate(a , cycle(a), mean,na.rm=T)

out<-v2zoo – coredata(monthly.mean)[coredata(cycle(v2zoo)),]



Let’ explain what goes on here. The caller supplies v2mean and the base period. We apply a window() to get the anomaly period, 1961-1990. we then aggregate() the data using the cycle of the data as our aggregation parameter. The cycle is just a index of the month: jan =1, feb =2. That gives us a monthly.mean for each month in the anomaly period. Next, we extract the coredata() from the monthly.mean zoo object and subtract the corresponding values from the entire series. Then we caste the result as a zoo object, give it the same time period as the incoming data (v2mean) and we have a zoo object of all our stations normalized by the anomaly period. Whew!

And to make all this happen we just issue this call:


which means, take v2mean, turn it into a zoo series, and then normalize it according the the base period in the camCriteria.

So now. we have process from the raw v2mean file to a zoo object that goes like this:




Read the raw data, window the years, average the duplicates,select the subset that satisfies the camCriteria and then normalize the data for that CAM period. Clean and relatively literate.

For grins, plot the first station



Code at the drop. moshtemp105. After moshtemp5, we will no longer build things from scratch every time and we will just work with the V2Anomalies. Processing speed of course will improve.

Next, things get a bit more complicated, so some organization of code pieces into separate files and folders and the introduction of a new download, a land mask, and the addition of the raster package. After that, we introduce the Station inventories and place our stations in the world.

Categories: Uncategorized
  1. August 18, 2010 at 7:36 AM

    Impressive documentation of your code. Far more detailed than I have the patience to write.

    • Steven Mosher
      August 18, 2010 at 10:10 AM

      Thanks Chad,

      So many people, Nick, Ron, yourself, the guys at CCC, Zeke, have inspired me to keep at this thing, its the least I can do.

      • August 19, 2010 at 4:24 PM

        Ah splendid, direct confirmation of our Goal 2!

        Hope to enjoy looking at your code (not that I know any R, but I should clearly learn it, since I know C and J already).

      • Steven Mosher
        August 19, 2010 at 8:59 PM

        Yes David, You guys are an INSPIRATION

  2. Patagon
    August 18, 2010 at 9:35 PM

    Thanks for very useful tips in R !

    On another topic, there is a quite interesting and large dataset of temperatures for the Alps, histalp, at http://www.zamg.ac.at/histalp/ .

    It is generally claimed that the Alps have warmed twice as much as the global average, and I wonder if the homogenization process of the data may have pushed it a bit.

    Have you ever had a look at those data?



    • Steven Mosher
      August 18, 2010 at 11:01 PM


      I haven’t looked at the data. right now I’m focusing on GHCN. Provided I can coerce the Alps data into a good format there should be no issue
      putting it in the system. ron Broberg may be interested as well

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: