Brick Testing: stackApply()
Continued testing of the raster package. Some new features are being added, so as they get added I am testing them out. In today’s post I will test out the following: a new function called “subset” and stackApply(). dropLayer() had a small bug in it related to NetCDF files ( I think) anyway, since subset appears to wrap dropLayer ( see the Forge SVN) if I test subset, then I should be implicitly testing dropLayer. So subset works like the “base” package subset. You pass it an vector of layers to keep ( see the API options for variations) There were a couple things I wanted to test with subset: did it get the layernames correct, and did it return the right values and how fast was it. It’s generally slow, whether you select 100 layers or 2. I think I know why that might be, but its’ not that big of a deal. I’ll test with dropLayer as well. It’s common to get some speed hits with a more general API. layer names didnt work for me. the function returned different layer names than I expected. That’s an easy work around.
stackApply() is a hugely important function for what we are doing here and it worked perfectly.
To the code:
# set a zoo type time index (year/month)
We start by getting the whole SST file. Then I create a timeline of months and years. Some simple calculations to get the layer numbers I want to subset. And stuff the layerNames slot of the brick with the dates.
I pull out 1960 January for test purposes. Then I Subset, choosing the 120 layers between 1960 and 1969. Then I pull the first layer from that decade. Should be the same as my test layer which I pulled before the subsetting. I remove the values from each of those layers, and test. identical tests TRUE. subset appears to work. ( other R functions such as intersect, union, set difference.. could also be overloaded .. that is you could intersect() two bricks and pull out the layers that match.
The next thing I test is the layerNames: I’m not sure what the expected behavior is here, but it would be nice if you subset jan1960 to dec1969, the layer anmes associated with the layers should be returned. Not sure that happened:
 “Jan 1850” “Feb 1850” “Mar 1850” “Apr 1850” “May 1850” “Jun 1850” “Jul 1850” “Aug 1850” “Sep 1850” “Oct 1850” “Nov 1850” “Dec 1850” “Jan 1851”
 “Feb 1851” “Mar 1851″…….
UPDATE: fixed in 1.4.8
That’s not a huge problem, since I can just do that by hand. Next comes the test of stackApply(). we have a Brick with 120 layers. I want the mean of every year. To do that we create a an index that shows which layers we want combined. Like so:
All that does is create a vector 120 items long. 12 ones, 12 twos 12 threes.. I can also ( and do in the land series) just take the ‘timeline’ of months and years and return it as an index of years 1850,1850,1850…. And then substract 1849 from that and I get the same index. The index says “take the first 12 layers and apply a function, Stick that result in layer 1 of the RESULT BRICK, take the next 12 layers, and the next 12, etc etc. The function I want to apply is mean, which I do with NA remove.
And There I have a raster brick, with the mean of every year for the years 1960-69.
Take the Big brick ( 1850-2010), subset; average over years.Plot. And yes, the layer names need to be changed. stackApply doesnt have ESP.
Fix that little glitch : layerNames(Decade)<-as.character(1960:1969)
Next, we check the actual numbers
1960 1961 1962 1963 1964 1965 1966 1967 1968 1969
-0.10972923 -0.03345876 -0.04221712 -0.12404647 -0.30515093 -0.20797414 -0.16017642 -0.13705026 -0.04763684 0.04900752
And we call pull in data from Hadley. RECALL, they have area weighted the temps and that is the last step I have to test with the new capability in ‘area’. My temps are not area weighted yet, but we get a good match as you would expect. A near perfect match will come when I area weight.