A question from the R list
I am currently working on rectifying the GHCN station list to improve the location information. Its the kind of database work that is mind numbingly tedious and a PITA in R. not because R lacks capabilities, its just tough and not very sexy doing matching and fuzzy matching and greping and blah blah blah. Instead, I’ll try to work a problem that was posted on the R list. When you work with big data it’s sometimes hard to get help on the list because your problem requires actually loading the data.
This note appearred on the help list: So, we will see if we can figure out how to help
SEE BELOW. The file was made available and its a couple minutes to get it into raster
“I am trying to use “clim.pact” package for my work, but since this is the beginning for me to use gridded datasets in “R”, I am having some trouble.
I want to do seasonal analyses like trends, anomalies, variograms, EOF and probably kriging too to downscale my 1 degree gridded data to 0.5. So, as a first step, I compiled my entire dataset (with 25 yeears of daily dataset which were present as 25 files) into a single netcdf file.
Then, I downloaded clim.pact to do further analysis, which works but seems to change dataset’s original dimensions’ order for “retrieve.nc” function (i.e. original lon, lat, time order was changed to time, lat, lon after using this function to get a subset). I am not sure as to why this happened and not able to get any plots such as box plot (showing trend in “lon”, “lat”, “time”), variogram (or variance), correlation analysis done because of this conversion problem.
Further, basic “R” functions seem to work well with objects such as dataframe, matrix ..etc with time in a separate column, and the data values (precipitation, or temperature) in a separate coulmn with corresponding station values (lon/lat). So, now I have very little idea about what I have to do.
Can anyone suggest me a better (probably more refined way) way than what I am currently doing to analyze these data?
The first thing we will do is question the whole need to put the data into a ncdf. R can read ncdf and so can raster. But i’ll suggest here that using ncdf as an intermediate data transfer tool is probably not necessary. In the end when we want to exchange data with others we can output ncdf or maybe HCF ( something I want to try for a particular project)
So, I’ve invited the writer of this question here and we will back up to where the data stands before he output a ncdf. I’ll also try to get his ncdf working.
In the comments below, I explain how to take his matrix of values and construct a raster of values for plotting.
here is the plotRplot
If you have a matrix of values in lat/lon you do the following
Make sure that the matrix is ordered in the right way. We usually go from 90N-180E to -90N 180E. Your values should be ordered so that rows correspond with decreasing latitude and columns with increasing longitude.
Then you just read it into a raster
myraster<-raster(test,xmn=-180, xmx=180, ymn=-90, ymx=90, crs=”+proj=longlat +datum=WGS84″)
If you have a different extent, just substitute the values in the function above.
class : RasterLayer filename : nrow : 36 ncol : 72 ncell : 2592 min value : 1 max value : 2592 projection : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 xmin : -180 xmax : 180 ymin : -90 ymax : 90 xres : 5 yres : 5
if you read the whole world in to begin with, then you can just grab subsets easily
e <- extent(-60,-50,0,10)
rc <- crop(myraster,e)
> e <- extent(-60,-50,0,10)> > rc <- crop(myraster,e)> rcclass : RasterLayer filename : nrow : 2 ncol : 2 ncell : 4 min value : 1177 max value : 1250 projection : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 xmin : -60 xmax : -50 ymin : 0 ymax : 10 xres : 5 yres : 5
Then I can also pull values out ( pull out all 4 cells)
> myValues 1177 1178 1249 1250
class : Extent xmin : -60 xmax : -50 ymin : 0 ymax : 10
UPDATE NOW THAT I HAVE THE FILE ITS ALL A COUPLE MINUTES OF WORK
> nc “file RF_80-85.nc has 3 dimensions:” “Lon Size: 69″ “Lat Size: 65″ “Time Size: 2192″ “————————” “file RF_80-85.nc has 1 variables:” “double Rainfall[Lon,Lat,Time] Longname:Daily Rainfall Data, 0.5 x 0.5 reso. (India) Missval:-999.9″>
str(Rainfall) num [1:69, 1:65, 1:2192] NA NA NA NA NA NA NA NA NA NA .
str(Rainfall) num [1:69, 1:65, 1:2192] NA NA NA NA NA NA NA NA NA NA …
> Lat <-get.var.ncdf(nc,”Lat”)
> str(Lat) num [1:65(1d)] 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 …
> Lon <-get.var.ncdf(nc,”Lon”)
> str(Lon) num [1:69(1d)] 66.5 67 67.5 68 68.5 69 69.5 70 70.5 71 …
class : RasterBrick filename : RF_80-85.nc
nlayers : 2192
nrow : 65 ncol : 69 ncell : 4485 projection : +proj=longlat +datum=WGS84 min value : ? ? ? ? ? ? ? ? ? ? max value : ? ? ? ? ? ? ? ? ? ? xmin : 66.25 xmax : 100.75 ymin : 6.25 ymax : 38.75 xres : 0.5 yres : 0.5
So, we have read the ncdf file into a raster brick. with the lats and lons all set up and 2192 layers
a layer per day
And we see a bad value in the last frame
> MeanRainfall 0.1062474
I can even use raster to create monthly values per cell, annual values, and do area weighting
Just see all the work on temperature.
I can even create an animation of the daily rainfall map.
But lets find the rainest day ( remeber the value at 2192 looks bad)