I’ll probably update the code dump this weekend unless somebody screams. For now I’ll just go through some of the code that I’ve just added in and retested.
Recall where we are. We have V2Anomalies. Rows of time, columns of stations. We have a gridded inventory. The two have to work together. Here is the issue. When we process v2mean we loose stations. We apply a time window and we have CAM criteria. So V2Anomalies has around 5100 stations. The inventory has 7280. So, we should trim the inventory. That’s easy. However, we have another issue. The land mask. Do we want to weight cells by the area of the cell OR weight by the land area. If the land mask is TRUE we want to weight by land area, This means we might have to trim both v2mean and the gridded inventory. Also, we want be able to handle situations where we take the 5100 stations in V2Anomalies and trim them. Like, only select the airport stations. There are bunch of solutions to this. None pretty. I can just assume that the user will do this trimming on his own. Its easy. BUT if you forget, you get bad answers. So I want to enforce a reconciliation.
R has no global variables. The scoping rules are such that a function can’t modify a parameter you pass to it. Well, you CAN with a trick. I will avoid that. Instead I handle the problem this way:
Basically, call a function that intersects these 3 pieces of information in one place and returns TWO values, held in a list. That’s a basic R paradigm. Lets look inside that code:
if(is.null(gridInv$Cell)) stop(“Pass a valid grid inventory”)
if(length(common)==0) stop(“No Stations in intersection”)
InvOut<-gridInv[which(gridInv$Id %in% common),]
anomalyOut<-anomalies[,which(anomaly.stations %in% common)]
Starightforward. We check to make sure that the caller passed in a gridded inventory. Then we select stations that have a value for Land that is greater than ZERO, iff the landmask is true. This drops a few stations on small islands and ships. Then we ‘get’ the stations from the Anomalies. the function that does this just gets the column names. Then we do some set logic to find the intersection. common<-intersect(stations,anomaly.stations) Then we trim BOTH the anomalies and the inventory so they are justified with each other. If you dont do this ( trust me) you will have troubles. Lastly we set the values to two items in a list.
The process flow looks like this: read in the stored v2Anomalies from disk. Read in the entire Inventory. Do whatever you like to slice and dice the inventory. Like select airports, rural, high altitude, whatever. Build as many different inventory options as you like. Then intersect them with the BASELINE Anomalies.
The next piece of code takes us to the next level of geographical data.
if(is.null(gridinv$Cell)) stop( ” Pass a Gridded Inventory”)
if(!(nstations==anomaly.stations)) stop( “Inventory Mismatch”)
cellmatrix<-aggregate(anomalies, by=idx, mean, na.rm=T)
What is going on here? In a nutshell. We are averaging stations WITHIN a grid cell. First I do yet ANOTHER check on the issue of justification of the inventories. I cannot prevent the user from passing in the WRONG inventory, sizewise. Then we sort the inventory and the anomalies, then we apply the function aggregate. This collects all the stations in a cell and averages them. Note also that I transpose anomalies to get the rows as columns and columns as rows. We fix the NA issues that arise when there is no data. Lastly we take the first column ( which contains cell numbers) and make row names out of it. We now return a Cell Matrix that is Gridcells by row, and Time by column.
Thats it. A couple more routines and we are done.