Home > Uncategorized > A question from the R list

A question from the R list

I am currently working on rectifying the GHCN station list to improve the location information. Its the kind of database work that is mind numbingly tedious and a PITA in R. not because R lacks capabilities, its just tough and not very sexy doing matching and fuzzy matching and greping and blah blah blah. Instead, I’ll try to work a problem that was posted on the R list. When you work with big data it’s sometimes hard to get help on the list because your problem requires actually loading the data.

This note appearred on the help list: So, we will see if we can figure out how to help

SEE BELOW. The file was made available and its a couple minutes to get it into raster

“I  am trying to use “clim.pact” package for my work, but since this is the  beginning for me to use gridded datasets in “R”, I am having some  trouble.

I want to do seasonal analyses like  trends, anomalies, variograms, EOF and probably kriging too to  downscale my 1 degree gridded data to 0.5.  So, as a first step, I  compiled my entire dataset (with 25 yeears of daily dataset which were  present as 25 files) into a single netcdf file.

Then, I downloaded clim.pact to do further analysis, which works but seems  to change dataset’s original dimensions’ order for  “retrieve.nc”  function (i.e. original lon, lat, time order was changed  to time, lat,  lon after using this function to get a subset). I am not  sure as to why  this happened and not able to get any plots such as box  plot (showing  trend in “lon”, “lat”, “time”), variogram (or variance),  correlation  analysis done because of this conversion problem.

Further, basic “R”  functions seem to work well with objects such as  dataframe, matrix ..etc  with time in a separate column, and the data  values (precipitation, or  temperature) in a separate coulmn with  corresponding station values  (lon/lat). So, now I have very little idea  about what I have to do.

Can anyone suggest me a better (probably  more refined way) way than what I am currently doing to analyze these  data?

The first thing we will do is question the whole need to put the data into a ncdf. R can read ncdf and so can raster. But i’ll suggest here that using ncdf as an intermediate data transfer tool is probably not necessary. In the end when we want to exchange data with others we can output ncdf or maybe HCF ( something I want to try for a particular project)

So, I’ve invited the writer of this question here and we will back up to where the data stands before he output a ncdf. I’ll also try to get his ncdf working.

In the comments below, I explain how to take his matrix of values and construct a raster of values for plotting.

here is the plotRplot

If you have a matrix of values in lat/lon you do the following

Make sure that the matrix is ordered in the right way.  We usually go from 90N-180E to -90N 180E. Your values should be ordered so that rows correspond with decreasing latitude and columns with increasing longitude.

Then you just read it into a raster

myraster<-raster(test,xmn=-180, xmx=180, ymn=-90, ymx=90, crs=”+proj=longlat +datum=WGS84″)

If you have a different extent, just substitute the values in the function above.

class       : RasterLayer filename    :  nrow        : 36 ncol        : 72 ncell       : 2592 min value   : 1 max value   : 2592 projection  : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 xmin        : -180 xmax        : 180 ymin        : -90 ymax        : 90 xres        : 5 yres        : 5

if you read the whole world in to begin with, then you can just grab subsets easily

e <- extent(-60,-50,0,10)

rc <- crop(myraster,e)

> e <- extent(-60,-50,0,10)> > rc <- crop(myraster,e)> rcclass       : RasterLayer filename    :  nrow        : 2 ncol        : 2 ncell       : 4 min value   : 1177 max value   : 1250 projection  : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 xmin        : -60 xmax        : -50 ymin        : 0 ymax        : 10 xres        : 5 yres        : 5

Then I can also pull values out ( pull out all 4 cells)


> myValues[1] 1177 1178 1249 1250

other commands:


class       : Extent xmin        : -60 xmax        : -50 ymin        : 0 ymax        : 10



> nc[1] “file RF_80-85.nc has 3 dimensions:”[1] “Lon   Size: 69″[1] “Lat   Size: 65″[1] “Time   Size: 2192″[1] “————————“[1] “file RF_80-85.nc has 1 variables:”[1] “double Rainfall[Lon,Lat,Time]  Longname:Daily Rainfall Data, 0.5 x 0.5 reso. (India) Missval:-999.9″>

Rainfall <-get.var.ncdf(nc,”Rainfall”)


str(Rainfall) num [1:69, 1:65, 1:2192] NA NA NA NA NA NA NA NA NA NA .

Rainfall <-get.var.ncdf(nc,”Rainfall”)>

str(Rainfall) num [1:69, 1:65, 1:2192] NA NA NA NA NA NA NA NA NA NA …

> Lat <-get.var.ncdf(nc,”Lat”)

> str(Lat) num [1:65(1d)] 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 …

> Lon <-get.var.ncdf(nc,”Lon”)

> str(Lon) num [1:69(1d)] 66.5 67 67.5 68 68.5 69 69.5 70 70.5 71 …

> b<-brick(“RF_80-85.nc”,varname=”Rainfall”)

> b

class       : RasterBrick filename    : RF_80-85.nc

nlayers     : 2192

nrow        : 65 ncol        : 69 ncell       : 4485 projection  : +proj=longlat +datum=WGS84 min value   : ? ? ? ? ? ? ? ? ? ? max value   : ? ? ? ? ? ? ? ? ? ? xmin        : 66.25 xmax        : 100.75 ymin        : 6.25 ymax        : 38.75 xres        : 0.5 yres        : 0.5

So, we have read the ncdf file into a raster brick. with the lats and lons all set up and 2192 layers

a layer per day




And we see a bad value in the last frame



> MeanRainfall[2191]     0.1062474

> plot(x=1:2000,y=MeanRainfall[1:2000])


I can even use raster to create monthly values per cell, annual values, and do area weighting

Just see all the work on temperature.

I can even create an animation of the daily rainfall map.

But lets find the rainest day ( remeber the value at 2192 looks bad)



Rainyday <-raster(b,layer=184)

> png(“rainmap.png”)>


> map(“world”,add=T)

> dev.off()


Categories: Uncategorized
  1. AJ Smit
    October 31, 2010 at 10:41 AM

    Hi, interesting post, and one for which I hope we’ll get some useful responses! I am grappling with exactly the same problem, but different data. The thing is, I am unconvinced that R is the best software to use for these problems. For many years MATLAB was the de facto software of choice for these analyses. Maybe also some python. The NCL software was designed specifically for these problems – it has been around a while, and it’s what I use to do the processing.

    But I like R (mostly for it’s statistical prowess) and like programming with it. There is no need for me to use R to do what can easily be done in NCL. But I WANT to see how it should be done in R, and I want to learn stuff in the process. The journey sounds fun.

    But I’d like to know, is R the right software for the job, and can what needs doing be done with existing packages (I have clim.pact but am yet to try it)? Should R be used if the analyses can be done easily elsewhere?


    • Steven Mosher
      October 31, 2010 at 11:15 AM

      clim.pact was written by Rasmus Benestad. The main issue I have with it is that I am MAC based and he doesnt have a mac binary done. So it will be hard for me to run and test and help. Basically, I will have to dig into his source to help. Not an issue there.

      On R. I like it because its portable. free. open source. great graphics and packages for all your problems. The biggest trick is getting the data into the right structures for the language.

      I’ll have a look more closely at Rasmus’ code and see what I can make of it. The key will be problem definitions. What do you want to do.

  2. bargavi thulasi
    October 31, 2010 at 9:32 PM

    thanks for the thread steven…! The main reason for sticking to R is because of the course which mandates me to do a project using spatial data in R… also, its free and packages are available for free of cost which read/analyze my dataset ..! the dataset was initially in GrADS (grid analysis and display system) format which i converted into a netcdf file in R! i am planning to try out some other basic analyses first! hope, i get some results ..!

  3. bargavi thulasi
    October 31, 2010 at 10:38 PM

    i found another package – “seas” in R. here are my doubts
    1. if i subset (for a particular set of lat/lon) a region from netcdf file using either ncdf/clim.pact packages, i get a matrix (x,y dimensions), and the lat/lon information are not stored in this matrix. i would like to get a dataframe with lat, lon, actual data(which is precipitation values) in separate columns, so that i could use it for my plots.
    will this make sense?

    • Steven Mosher
      October 31, 2010 at 10:55 PM

      1. what format is your data in before you stick it in a ncdf?

      2. What is the ncdf structure

      >nc nc

      > nc$var

      3. If you subset the data and extract it, what extent are you using and what is grid size
      for example: extracting minlat,maxlat, minlon,maxlon. cell size = 1 degree
      If you have that you can build a raster brick ( see the raster package) and then
      you can just plot all the values on a map with one call. basically, you just take
      your matrix of values ( rows=lat, columns =lat) tell raster the extent
      e <- extent(-60,0,0,60), specify the projection and your data is stored in a geographic
      grid. You can then add layers for different time steps.

    • Steven Mosher
      October 31, 2010 at 11:25 PM

      Your other option is to take your matrix of lat/lon and regenerate the lat lon

      if you have a matrix of say 36*72 where 36 is lat in 5 degree bins and 72 is lon in 5 degree bins then its a simple matter to generate the lat/lon and turn the matrix into a vector

      practice with some simple examples:

      longtest <- c(t(test))

      there we took a matrix of 36*72 (row =lat, column=lon) and we turned it into a vector
      of values.
      Now you just need to add the right latitude and longitude.


      yourFrame <-data.frame(Lat=allLat,Lon=allLon,Rain=longtest)

      yourFrame yourFrame[1,]
      Lat Lon Rain
      1 87.5 -177.5 1

      But using raster is much easier cause it just does this for you.

      > myraster myraster
      class : RasterLayer
      filename :
      nrow : 36
      ncol : 72
      ncell : 2592
      min value : 1
      max value : 2592
      projection : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
      xmin : -180
      xmax : 180
      ymin : -90
      ymax : 90
      xres : 5
      yres : 5

      Lets suppose that your 36*72 matrix covers (-90,90,-45,45)

      > myraster myraster
      class : RasterLayer
      filename :
      nrow : 36
      ncol : 72
      ncell : 2592
      min value : 1
      max value : 2592
      projection : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
      xmin : -90
      xmax : 90
      ymin : -45
      ymax : 45
      xres : 2.5
      yres : 2.5

      piece of cake.

      then just plot it

      map(“world”, add=T)

      see the post for the result

  4. Ruhroh
    December 21, 2010 at 2:37 AM

    Mr Mosher;

    Here’s my thoughts on your image processing ‘request’;

    One challenge from old sat images would be ‘rectifying’ (or in another jargon, ‘Registering’) the images to that they overlay each other, and are comparable on a pixel-by-pixel basis.
    Very likely there will be some rotation as well as translation between images taken through different filters.

    I have done a fair amount of that kind of work with globalmapper , which to my knowledge is not available on MAC. It was largely developed to allow people to integrate the vast archives of old paper maps into a ‘geo-referenced’ frame. The demo version allows full functionality on 4 layers, and it gracefully handles very large files. The developer is very responsive to bug fixes as they are found to be needed. They have a nice GUI for ‘rectifying’ images by user-selected fiducial points.

    Certainly an automatic ‘re-rendering’ of overlapping images into aligned pixels would be marvelous. But in the absence of a full-auto wonderthing, I would be happy to do a finite number of re-renderings, for proof of concept activity. This might help motivate the grungy automation process that would needed to process N different ‘sites’, to truly nail down the UHI issue.

    But perhaps I am guessing up the wrong tree about the stickiest wicket from your perspective.

    BTW; I reside in the bay area, so we could speak directly at some point. Probably a good idea if we are going to have collaborative effort.


  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: