## Calculating Area Averaged Temperatures with Raster

Browsing through the r repository of packages a while back I happened upon ‘raster’ At the time I just used raster for getting “grid cells” for temperature stations, working with a landmask and area calculations and plotting. I didnt use raster for any substantial calculation. With the addition of SST it seems like the perfect tool. before getting to the problem with SST, I’ll turn back to the calculations made for the land. Basically to show some raster at work.

Lets recall the flow of processing the land code. Stations ( at lat/lon) are brought into a matrix that is ROWS of stations and columns of time (months). Then the routine as.Grid(), takes the station lon/lat and returns a cellnumber for the gridsize you use. A raster object of 3degreex3degrees is used to generate these cell numbers. basically the 2D description of lat/lon is turned into a unique cellnumber. for a 5degree world there are 36*72 cells. rows for lat, columns for longitude. Continuing, we then average the stations within each cell. If there are 3 stations in cell 1020, we average them. In a 3×3 world the stations per cell range from 0 to over 30. After this averaging we have a “cell” series. A 2d matrix of cells in the rows and time in the columns.

Then we proceed to “area average” these series. For each month we calculate the total area on the globe where land measurements are taken. No measurement, no area. We then calculate a weight for each cell based on its ratio to the total area. I use raster to do this, sort of. At the time it occurred to me that I could have raster do all this work, much clear and less prone to error. There are some challenges however, so I did some of those things by hand, knowing that I would return to raster later when working on SST, AND when combining the SST with the land.

Let’s start with the obvious problem in the land data. Co-located stations. The raster layer takes one value per cell. If we have multiple stations within a cell, then we have to reduce that problem. One way, is just to introduce a finer grid. What happen if we go down to a .25degree world? I’ll show that, just as a way to introduce how raster will solve some issues and make doing the SST easier. Remember with SST we have 1 measure per 5 degree bin.

First, lets build a fine grained world: .25dgrees per side of the gridcell. Fetch the Station Inventory, and build a synthetic temperature series.

rws=180/.25

cls=360/.25

world<-raster(nrows=rws,ncols=cls,xmn=-180, xmx=180, ymn=-90, ymx=90,

crs=”+proj=longlat +datum=WGS84″)

inv<-getGhcnInventory()

invTemp<-data.frame(Lon=inv$Lon,Lat=inv$Lat, Temp=rnorm(length(inv$Lat),1,.5))

So, we have a dataframe with the longitude,latutude and fake temps. Next, lets see what gridcells those stations get assigned to

tempcell<-cellFromXY(world,invTemp[,1:2])

print(length(tempcell))

print(length(unique(tempcell)))

The point here is that we can see that the number of unique cells is smaller than the number of total cells. Look at the stations:

Cell Lon Lat Temp

5269 391763 -159.40 22.00 5.594263e-01

5268 391763 -159.35 21.98 1.525001e+00

5271 394648 -158.10 21.30 9.538807e-01

5272 394648 -158.07 21.32 -5.408068e-02

5273 394649 -157.93 21.35 9.353993e-01

5270 394649 -157.78 21.45 1.626147e+00

5279 397535 -156.47 20.83 1.428712e+00

5278 397535 -156.43 20.90 2.015386e+00

There are a couple ways to solve this, ( other suggestions welcomed) so for now, I’ll just remove all “co gridded”, cells there are only a couple hundred. The point to make here is how I can use raster to aggregate. So, lets remove the duplicates, for illustration only.

mean(invTemp$Temp)

CellInv<-data.frame(Cell=tempcell,Lon=invTemp$Lon,Lat=invTemp$Lat,Temp=invTemp$Temp)

dup<-table(CellInv$Cell)

dex<-which(dup==1)

cells<-as.numeric(names(dup[dex]))

CellInv<-CellInv[which(CellInv$Cell[] %in% cells),]

Now, I have a dataframe, with one station per .25degree grid cell. What next? stick the temps in:

world[CellInv$Cell]<-CellInv$Temp

plot(world)

cellStats(world,mean)

I apologize for the faint colors:

With raster however we can ask the package to AGGREGATE this into a larger grid.

w2<-aggregate(world,fact=4,mean)

The fact=4, tells the package to combine cells 4:1, which gives us a 1 degree world. AND to use the function ‘mean’ to do this. NAs are handled without specifying na.rm. You get this:

w3<-aggregate(world,fact=12,mean)

cellStats(w3,mean)

Then we would apply a Land AREA weight

And come up ( after some fiddling with total area sampled) with a weighted temp for the grid.

What lies ahead? When it comes to working with SST I dont have to worry about aggregating. We have a 5 degree SST data set. for 1927 months! the question is how to process all those months. The key , I hope is in the “raster brick” A “brick” is a multilayer raster object. Each layer of the brick will represent a Time, a month.

Then for each month we have to.

1. get the area sampled ( that’s easy with masking and a water area map)

2. multiply every temp by the gridweight

3. cellStats(layer,sum) = temperature average for the month.

Once all 1927 months(layers) are done, then we aggregate the layers in groups of 12, taking the mean. Thats our annual average.

How to do that with a raster brick? I’ll play around some and see what happens. Also a question to the R help list, while I busy myself learning raster better

“There are a couple ways to solve this, ( other suggestions welcomed)”

You could use (base) aggregate or tapply to compute the average temperature value for each grid cell and use that value.

Alternatively, with a raster function, you could do something like:

world <- pointsToRaster(world, xy=cbind(inv$Lon,inv$Lat), values=inv$Temp, fun=mean)

That cuts out the cell number computations for you (although that is an instructive thing to do).

Thanks, I like the pointsToRaster(world….)

I will still collect the cell number computations, because I want to know how many stations go into each cell.

In my current code I do use aggregate to combine stations before putting them in the cell, and I was hoping that

there was ( actually figured there HAD to be ) a raster function to do this.

You can also do

#as before

avg <- pointsToRaster(world, xy=inv[,c('Lon','Lat')], values=inv$Temp, fun=mean)

# number of observations

nob <- pointsToRaster(world, xy=inv[,c('Lon','Lat')], fun=length)

#Or both avg and nob in one go:

x <- pointsToRaster(world, xy=inv[,c('Lon','Lat')], values=inv$Temp, fun=c(length,mean))

That is very slick Robert, how do you reference the obs? in x?

I’m thinking that i should go back and rewrite all the land processing code I wrote

pointsToBrick also seems a logical extension.

x is a RasterBrick with two layers. The first has the number of observations, the second has the mean value of the observations. A new function pointsToBrick is not needed because pointsToRaster already does that when appropriate.