
MoshTemp301 is all about Location. Putting stations on the map.
refreshGhcnInv()
Globe<-getLandMask()
invObj<-getGhcnInventory()
gridObj<-as.GriddedInventory(invObj,Globe)
gridInv<-gridObj$Data
stations<-table(gridInv$Cell)
hist(stations,breaks=40,labels=T)
gridInv[1:100,c(1,2,3,4,17,18,19)]
Just a short explanation here, and then more before I do the final drop of MT301. If you run MT301, you get the chart above and the output below
>gridInv[1:100,c(1,2,3,4,17,18,19)]
Id Name Lat Lon Cell Area Land
1 10160355000 SKIKDA 36.93 6.95 2103 88477.57 27428.05
2 10160360000 ANNABA 36.83 7.82 2103 88477.57 27428.05
3 10160390000 DAR-EL-BEIDA 36.72 3.25 2102 88477.57 23888.94
4 10160395001 FTNATIONAL 36.52 4.18 2102 88477.57 23888.94
5 10160400001 CAP CARBON 36.80 5.10 2102 88477.57 23888.94
6 10160402000 BEJAIA 36.72 5.07 2102 88477.57 23888.94
7 10160403000 GUELMA 36.47 7.47 2103 88477.57 27428.05
8 10160419000 CONSTANTINE 36.28 6.62 2103 88477.57 27428.05
9 10160425000 CHLEF 36.22 1.33 2101 88477.57 13959.79
10 10160425001 ORLEANSVILLE 36.17 1.50 2101 88477.57 13959.79
11 10160430000 MILIANA 36.30 2.23 2101 88477.57 13959.79
12 10160444000 BORDJ BOU ARR 36.07 4.77 2102 88477.57 23888.94
13 10160445000 SETIF 36.18 5.42 2102 88477.57 23888.94
14 10160457000 MOSTAGANEM VI 35.88 0.12 2221 91909.99 91195.14
To the code. The first function refreshGhcnInv() downloads the station inventory from NOAA and then cleans up the text fields in that file ( so they print a bit nicer..more work here) and creates and writes a Rdata object to your working directory. Then we Globe<-getLandMask() which means we load the data in the land mask. Next we read in the station inventory object:invObj<-getGhcnInventory(). The next step is critical. We take that inventory and we feed it to a function which will add the extra geographical information we need. Basically the grid cell the station lives in, the AREA of that gridcell and the actual LAND area. The GHCN inventtory has data from surface ships and some tiny islands. They will register ZERO land. There are not many and it doesnt matter how you treat these stations in the final analysis. I will drop them. But rest assured I tested it both ways before making that decision. Here is that function. Take the ghcnInv data and the Globe and return me a gridded inventory gridObj<-as.GriddedInventory(invObj,Globe). Recall that in MoshTemp202 I added some bits ( Parameters) to the output objects to make them more encapsulated self documenting objects. The DATA I want is in the $Data element. get that: gridInv<-gridObj$Data. There now I have a GHCN Inventory file that has been “gridded.” I’ve added geographical information to it that is KEYED to the gridding I selected for my land mask: Lets see how:
> gridObj$Parameter
$Title
[1] “BASELAND1900-2009″
$Cellsize
[1] 3
$Years
[1] NA
$Cam
[1] NA
$Landmask
[1] NA
$SourceData
[1] “v2.temperature.inv”
$SourceDate
[1] “2010-08-19 15:29:00 PDT”
So, whenever I am reading and writing these objects to file I have to make sure I update anything that changes. In this case when I created the gridded inventory I gave the source data, the DATE of that source data and since I took information that depends upon the cell size, I updated that as well. Recall the cell number has no INHERENT meaning. If the world is gridded in 3 degrees, then cell number 1 will cover a specific 3 degree by 3 degree grid (-180E,90N to -177E,87N) If I did a 5 degree grid cell one would cover a different grid. There are 7200 total grids in a 3degree world.
Next I do a simple summary to quick check that I ported by old code correctly stations<-table(gridInv$Cell) and I make a histogram that shows a lot of cells with one station only. hist(stations,breaks=40,labels=T). The table function gives me a frequency count of stations in every cell. If a cell has 40 stations, the count is 40. 5 stations, the count is 5. The chart for that is above
lastly I’ll output a portion of the inventory and check gridInv[1:100,c(1,2,3,4,17,18,19)] I have selected the first 100 records, and output the first 4 fields and the last three fields. Those fields get appended when you call “as.GriddedInv”. They are $Cell, $Area, $Landarea.
so the first cell has a land area that is LESS than the grid cell area. WTF? ah, that means one thing, noob.
sucker must be near the coast and only PART of the 3 degree cell is land: Check for yourself:
>gridInv[1:1,c(1,2,3,4,5,6,7,8,9,10,11,17,18,19)]
Id Name Lat Lon Altitude Grid_Alt Rural Population Topography Vegetation Coastal Cell Area Land
1 10160355000 SKIKDA 36.93 6.95 7 18 U 107 HI NA CO 2103 88477.57 27428.05
YUP! $Coastal=CO Hit Google to check that thing( later we output google earth tours wo0 hoo):
More later. I guess my inbox is filling
Like this:
Like Loading...
Where do you get your land mask from?
Its there in the code:
url_Land_Mask<-"http://islscp2.sesda.com/ISLSCP2_1/data/ancillary/land_water_masks_xdeg/land_water_masks_xdeg.zip"
There are higher resolution ones at the same location. I’m not entirely happy with it, but it was in ascii.
Brill, thanks!
David, I disovered an issue with Land masks last night while trying to replicate HAdiSSt2. Not really an issue, but something you need
to be aware of. The Mask I used has “inland water”. It’s probably better to use the masks on that site that have inland water removed if you are doing a
merge of Land and SST values. It depends upon how you do your Area weighting.
Most people do there area weighting by simply looking at the “extent’ of a grid cell ( a simple function of latitude). If you want something more fine grained that that you use the percent of land/water in a grid. That’s what I do, But in selecting a land grid with inland water, you get the
actual land area. Probably better to remove the inland water. Its a minor minor detail that doesnt impact the total averages
rather just the accuracy as you weight each cell by the cell area/total area. When I finish the SST stuff I’ll go back and update
accordingly
At the moment I’m just concerned with ccc-gistemp. For any given cell (there are 8000) GISTEMP ends up selecting either the land series (which may be empty) or the ocean series. There is no weighting or blending or anything clever. I’m merely trying to represent that land/ocean decision as a mask. Recall that the GISTEMP grid cells are equal area, that leads to a tiny tiny simplification in the ccc-gistemp code.
Concerns about lakes noted.
Hi David, if you look at the site where I got the mask from you’ll note a data product that is a BINARY mask. The percentage maps
are turned into a binary 0/1 map. So .GT. 50% gets a 1 for land. So just hunt around that URL and you’ll see the binary map.
Zeke, and chad and I all tested with both choices: percentage maps and binary maps. No difference. Nick stokes tested as well.
I actually like the equal area approach just for simplicity of downstream math.