## Forking Myself

I’ve spent some time forking myself. Over the past few days when I could steal away an hour here or there I decided to make a big change to the package. But it’s a good change. First some book-keeping. The Romantest.R file has a minor bug in it. Not really a bug, I just pulled out the wrong statistic from the brick ” cellStats(..,sum), should be cellStats(…mean) unless you want to errr “add” all the cells together. Too eager to post.

On with the forking. Nick Stokes and I exchanged a few emails looking at various ways of speeding up readV3Data(). During the course of that we found a small “bug” to report to NCDC. As I thought more about his data structure, a 3D array of stations, months, and years it became obvious that the whole package could be easily rewritten to support it. And I could bury forever the old 14 column v2 format. The way I’ve implemented that is an option on the readV3data() function. You specify which kind of object you want returned: a zoo object, mts object or an Array type object. Zoo and mts are just variants of each other with station data in columns and time in rows. Then I defined a series of functions asZoo(), asMts(), asArray(), which allow you to switch from one representation to another. Then I rewrote all the functions to accept any type of object. This cut down the number of actual functions I need and moved me closer to OOP. A couple of the functions, still accept only one type of data, for example Romans and Taminos, are still “mts/zoo” centric. Some routines disappeared all together: windowV3() is gone and windowArray() takes it place, although you could do test <- asArray(window(asZoo(Data),start =1920, end = 1940 +11/12).

so you can now write anomalize() and pass it a zoo, mts or Array. And passesCam() replaces the several variants. The demos have been rewritten and I added a couple test programs. Building a new package will take time as the manual will require much rewriting.

Then I had another little task. Rewriting Nick’s weights program. The function is used to calculate a series of weights for the data array. These weights are a function of the monthly count of reports in a grid cell and an area weight for that gridcell. My main goal here was to see if I could rewrite the code using raster. Well, I did, but it wasn’t entirely clear and last bits of it were particularly nasty so I decided instead to stick with Nicks code, add some clearer variable names, and change it so that you can use any size grid cell. Here is what the algorithm does: It takes a all the temperatures series and gives them a “cell” id, exactly like the cell value in a raster. Every cell receives a weight that is proportional to its area, basically sin() of Latitude so that cells at the equatoer which are larger have a weight of 1 and cells toward the poles are weighted less. Then a “count” is established for every valid temperature measure in the cell for every month of every year. So a given cell could have 5 reports one month and no reports the next month or 3 or 2, and so forth. Then the area weight is divided by the counts in that cell. And I refer to this as “inverse density” the more reports, the less each report is weighted. The smaller the area, the smaller the weight. As the function stands now, I’ve added a raster as a calling parameter. I’ll probably change that to simply a cell size, as everything I need can be calculated from a cell size. Here is the code. This function, of course, doesnt take a zoo or mts as a parameter, but that’s relatively easy to handle with the asArray() function, so I’ll probably change that as well before adding it.

inverseDensity <- function(inv, Data, r = GLOBE5){ rClass <- class(r)[1] isRL <- rClass == "RasterLayer" if (isInventory(inv) & isArray(Data) & isRL){ # check that inventory stations and Data stations match if (!identical(getStations(inv), getStations(Data))) stop("stations must match") # dimensions of Array dimensions <- dim(Data) # array of weights for output that looks like temperatures weights <- array(NA,dimensions) lonBin <- ncol(r) latBin <- nrow(r) halfLat <- latBin/2 eq <- halfLat * (lonBin +1) + (halfLat+1) cellSize <- res(r)[1] cellId <- floor(inv$Lat/cellSize) * lonBin + floor(inv$Lon/cellSize) + eq #weights for latitude changes latWeight <- sin((1:latBin-0.5)*pi/latBin) #weights for every lat/lon allWeights <- rep(latWeight, each = lonBin) for(months in 1:dimensions[2]){ for(years in 1:dimensions[3]){ # get T/F if temperature is there reports <- !is.na(Data[ , months, years]) # get the cell numbers of those cells with reports cells <- cellId[reports] # a count of reports per cell per month counts <- tabulate(c(cells, lonBin * latBin)) # Ok is the logical of counts > 0 ok <- counts > 0 # make a copy of allweights to modify density <- allWeights # note only "ok" elements are valid density[ok] <- allWeights[ok] / counts[ok] #print(density[ok]) those True cells get a density area/count weights[reports, months, years] <- density[cells] } } return(weights) } if ( !isInventory(inv) ) stop(" must be a valid inventory") if ( !isArray(Data)) stop("must be a valid data array") if ( !isRL ) stop("must be a valid raster") }

Sounds good, Steve. I think inverse density is the right concept, so I’m glad to see a routine that provides it explicitly.

I’m moving toward making the computation of CellId and AllWeights internal functions. These depend on the cell geometry, and I’m using different types. But once you’ve done that, the loop is generic. Your lonBin * latBin is just the total number of cells, and I’d call it NumCells say; then the loop is working purely on cell counts, and not their geometry.