RomanM’s Method

Home > Uncategorized > RomanM’s Method

RomanM’s Method

July 23, 2011 Steven Mosher Leave a comment Go to comments

UPDATE: Romantest.R has been added to the box on your right. The test includes all of Romans code, and the two routines needed to feed data into a raster brick. averageBycell() and rasterizeCells(). The code should work with version 1.3+. version 1.5 is headed to CRAN. Note there appear to be bugs with “image” in R 2.13.1. so plots are not pretty. fix is coming, you can probably revert to 2.13

I’ve succeeded in getting a version of RomanM and JeffId’s Thermal hammer working with version 1.3 of RghcnV3. This is going to be a long post because there is a lot of ground to cover. First, some errata, the “Globe” demo in V1.3 appears to have a missing line, looks like an editor bug, so I will have to push 1.4 to CRAN. CRAN has not been building packages for a bit, so I’ll post 1.4 here.

Roman’s method. Roman’s method is covered at a variety of posts at his site and at Jeffs. My goal here isn’t to explain his method in detail but rather to show how it will fit into RghcnV3, Roman’s method works by using all the information in all the time series to come up with an average of all the time series. The function is currently called “temp.combine(), however that name will change as we move it into RghcnV3. I will also rename the function that uses Tamino’s method. As it stands temp.combine() takes a “mts” object or multiple time series object and a series of weights. It outputs a single time series, offsets for all time series, predicted values for all time series and residuals. For the first iteration I will leave the weights in. In discussions with Roman he indicated that these weights could be used for distance weighting, but we won’t we using that in today’s work.

As I plowed through the code it became apparent to me that the paradigm of holding station data in columns ( the method used by me, roman and tamino) actually has some drawbacks especially when it comes to combining stations with ‘aggregate()’. I’m not in the mood to change a bunch of code from column based to row based; and, its not simply a matter of doing a few transpositions. Practically what this means is that some of the code used in Roman’s method will use loops. I’m actively working to remove that, but loops work, so working code wins the day.

Plan of attack:

The goal for this exercise is to use Roman’s method to calculate an average for a grid cell. In the CAM method we use a simple ‘mean’ to calculate the average for a grid cell. That happens in the function “rasterizeZoo()”. That function takes an inventory of stations and a 2 dimensional Zoo object and “rasterizes” stations into the grid structure of a 3D brick. A brick is lat/lon/time. Some side notes on rasterizeZoo(). RasterizeZoo() calls the raster function “rasterize”. Rasterize need the following inputs: a matrix of xy points (station lon/lat), a base raster to use for every layer, a matrix of values to rasterize into cells, and a function. The matrix has to have time in columns, stations in rows. So we feed it a zoo object and we transpose that zoo object ‘under the hood’ inside the function. The lastest version of zoo just added a transpose function. arrg I needed that a couple months back. But again, I’m not rewriting a bunch of code, just yet.

Our plan then will be to call a function like rasterizeZoo(). However, it will be slightly different. RasterizeZoo() takes multiple stations and averages them per cell. When we use Roman’s method we will do this ‘averaging’ BEFORE we call ‘rasterize’. Essentially we will call “rasterize” with “cells” rather than “points”. We could also use the raster function “setValues()” but that function needs a little tweak to allow this, so I’ll work with rasterize. That’s our goal. Get a zoo data structure and associated CELLS (not points) to feed to rasterize. Our first function must then take a mutiple time series, divide the stations into the right CELLS, and then romanize those stations to come up with a cell time series.

Here we go:

averageByCell <- function(r =GLOBE5, inventory, Mts){
  # check classes code goes here
  # check to make sure that all the inputs are the right classes. To do.
  # get the cell number associated with each station. xy maps to a cell in the raster
  myCells <-  cellFromXY(r, getLonLat(inventory))
       # a obscure hack to get the stations associated with a cell!
       # used to get the column numbers for each cell
       # recall that every column is a station. so we will use column numbers to subset
  DF <- data.frame(Id =getStations(inventory), Cell = myCells)

  # now we get a cell count. we are going to loop over the unique cells
  cellcount <- length(unique(myCells))
  # we need the unique cells in a vector for referencing and for NAMING output columns
  uCells <- unique(myCells)
 # loop over cells. IF mts were transposed we could use aggregate
 # but that would entail rewriting all of romans code.. or maybe use mapply
  for( cell in 1:cellcount){
       # dex will hold the positions of stations in DF$Cell. its also the columns we need
       dex <- which(DF$Cell == uCells[cell])
       # call temp.combine NOTE I can also call taminos here or a simple mean
       CellAve <- temp.combine(Mts[ ,dex])
       if (cell == 1){
          out <-  CellAve$temps
        }else {
          out <- cbind(out,CellAve$temps)
          # COPY the cell number to the column name! Critical
          colnames(out)<-uCells[1:cell]
         }
   }
  # make it a zoo object! we want to pass a zoo object to rasterizeZoo()
  Zout <- as.zoo(out)
  # put out the xy for every cell. This will probably be dropped in production
  xy   <- xyFromCell(r,uCells)
  return(list(Points=xy,Zoo=Zout))

}

So that’s it. We have a function that we feed the following values: a mts object. an inventory and a blank raster. That function will figure out which stations belong in which cells. It will then call “temp.combine()” with SUBSETS of the full mts object. temp.combine(Mts[,dex]) uses “dex” to select those stations that have matching cell numbers. So temp.combine works on a grid cell at a time. Then all those individual time series ( grid series) are collected into a zoo object that has CELL NUMBERS as column names. This is different from zoo objects that have station Ids as column names.

Now that we have a zoo object of averaged grid cells, we will call a function like “rasterizeZoo” but for clarity I’ll create a different function. I’ll probably come up with a special class for this and have just once function, but for now we have two. Here is that function.

rasterizeCells <- function( ZooCells, r = GLOBE5){
  require("zoo")
  require("raster")
  # check the various calling parameters
  # check for a good cellsize

  #check Zoo as a zoo
  # stop if they dont match
  # proceeding otherwise
  if (!is.zoo(ZooCells)){
    cat("Zoo is a ", class(ZooCells), "\n" )
    stop("Zoo must be a zoo object")
  }
  # now we should check that the cells in the column names can be found in
  # the raster. What we really need is a tighter coupling so we KNOW the resolution matches: OOP someday
  cells <- 1:ncell(r)
  zoo.cells <- as.integer(colnames(ZooCells))
  common    <- intersect(zoo.cells,cells)

  matching <- identical(intersect(zoo.cells,cells),zoo.cells)

  if (matching == FALSE){
    print(" mismatching cells between Zoocell and raster")
    stop("check your Raster")

  }
  if (!class(r)[1] == "RasterLayer") stop("r must be a raster object")
  resolution <- res(r)
  if (resolution[1] != resolution[2])stop(" must use a raster with square cells")

  ###########  ready to process
  #  get the lon lats of Cells
  #  suck the time out of the zoo object
  #  transform the zoo object into a matrix with right orientation
  #  rasterize with mean. Need to see if we can change rasterize to merely copy.
  # also replace this code with zoo  transform.. requires LATEST zoo library, so change depends

  xy      <- xyFromCell(r,zoo.cells)
  TIME    <- time(ZooCells)
  # core data wacks the row names
  data    <- coredata(ZooCells)
  dimnames(data)[[1]] <- TIME
  tdata   <- t(data)
  gridded <- rasterize(xy, r, field = tdata, fun = mean, na.rm = TRUE)
  gridded <- setZ(gridded, as.yearmon(time(ZooCells)), name = 'time')
  return(gridded)

}

That’s it. We now will be able to call readV3data(), v3ToMts(), averageByCell(), rasterizeCells() and the result will be a brick where every grid cell has the average temperature as computed by Roman’s method. Slick.

So I did a little test with Texas. Here is what a month of texas data looks like: Please note the nasty image bug with R.2.13.1. yuk. Note also that I used a 3×3 grid to do this.

And if you compare the answer you get with and without grid based averaging

Categories: Uncategorized

Comments (6) Trackbacks (1) Leave a comment Trackback

Pat Burns

July 23, 2011 at 2:26 PM

Reply

A (not very polished) function that produces an x-axis that might be considered better for the second plot in the post is at: http://www.portfolioprobe.com/R/blog/pp.timeplot.R
- Steven Mosher
  
  July 24, 2011 at 2:02 AM
  
  Reply
  
  Thanks. For the most part I am putting off incorporating and graphics code in the package. When I get around to it I will probably do some stuff from rasterViz() and ggplot2 and hadley is promising some coll interactive graphics ( hadley wickham) Maybe I’ll create a separate package for this. I want to keep this package down to bare bones
Rob

July 23, 2011 at 10:13 PM

Reply

Incorporating RomanM’s method sure does seem to be a bit more complicated code wise. Is there any way of controlling the size of the raster cell because I can see how increasing/decreasing the size of the cells could be of use. Do you have the code for temp_combine posted anywhere?
- Steven Mosher
  
  July 24, 2011 at 2:15 AM
  
  Reply
  
  For now Roman’s method is more complicated code wise. But I’ve hidden that all away
  
  There will be two functions averageByCell() and rasterizeCell()
  
  Behind the scenes ( upstream) I’m working ( hoping) to get support in a couple of raster functions that will make this code internally easier/faster/disappear altogether.
  
  If I bit the bullet and rewrote everything with stations in columns some of this would magically disappear. so much for the hard work. I think when I get nick stokes stuff integrated and Muller’s stuff integrated ( havent seen code yet ) at that point, maybe 2.0 or 3.0, I would
  look to turn the whole think into an S4 implementation. That probably the stage at which I switch the orientation of the data. Although switching to S4 puts a huge burden on me to write wrappers for every function people expect to use with objects.
  
  Controlling the size of the cell:
  
  1. You select the cell size by changing the raster you pass.
  
  Test <- averageByCell(data, r = GLOBE3)
  
  a raster with 3degree bins.
  
  like so:
  
  r <- GLOBE5 # a 5 degree globe is always available to toy with. its a data object loaded
  # when you attach the package
  
  res(r) <- 1
  
  Now its a 1 degree globe!
  
  FUTURE PROJECT.
  
  my future project will be to rasterize to irregular clusters on the sphere. That will probably require work "upstream" in the raster package, but all the infrastructure is there. When I say work "upstream" I mean working with the maintainer of that code to support the idea. It has to make sense and be of general use in other sciences. I can of course write custom code using raster commands to do this, but the idea is to create "generalized" functions that serve spatial processing.
- Steven Mosher
  
  July 24, 2011 at 2:17 AM
  
  Reply
  
  I will package up the roman stuff into a test program and post that in a couple days.
  
  expect no fancy graphics. as always.
Callum

September 4, 2011 at 12:18 AM

Reply

UK PAT Testing around West Yorkshire. Prices from 85p P/A!! http://www.wypattesting.co.uk (sorry-backlinking)