Version 1.3 Verified

Home > Uncategorized > Version 1.3 Verified

Version 1.3 Verified

July 20, 2011 Steven Mosher Leave a comment Go to comments

Whew,

Update: Sources and windows package in the drop box. CRAN is a bit slow in getting around to things.

Update: AirVent Post source is in the drop box.

For a few hours last night I struggled with a bug, err several bugs. In the end some of the bugs were mine, some of them were “upstream” and some of them were far “upstream”. Image bugs in R 2.13.1. But the math bugs ( or flat out errors on my part) have all been fixed. And the verification of the results is done. version 1.3 should be available for download here in bit, I just need to pretty up some code in the demos.

The process of using raster bricks to create a global average is a bit tricky if you are not used to thinking in simple operations.

Anyway, I worked my way through my own stupidity and ended up with this

Categories: Uncategorized

Comments (11) Trackbacks (0) Leave a comment Trackback

Rob

July 22, 2011 at 8:49 PM

Reply

Hi Steve,
very interesting package. I’ve been following through the method used over at JeffIDs to try an analyze a subset of data.

I was wondering about a couple things. Firstly i’m having trouble with the anomalize and the annualize functions.

###regionAnomaly <- anomalize(regionAve$Zoo$x)###
Error in anomalize(regionAve$Zoo$x) : TemperatureZoo must be a zoo object

###regionAnnual <- annualize(regionAnomaly)###
Error in inherits(object, "zoo") : object 'regionAnomaly' not found

On the same subject the Plots he did aren't working for me either.

On another subject. I know I can copy and paste from R into excel or text etc… but how would I export to .csv the things that I may need (regionave etc…)

Finally,
I'm getting some pretty low numbers for my station counts in my region of study in recent times. I know that manually I can download the data to supplement from environment Canada if I really needed to but would it be possible to create a script for this so that data for the regional average could be taken from other sources such as environment Canada?

Do you think that using the non-adjusted V3 would result in more data availability?
- Steven Mosher
  
  July 22, 2011 at 9:20 PM
  
  Reply
  
  Start with the basics.
  
  When you have an issue you should:
  
  >sessionInfo()
  
  and copy the output for me to see.
  
  then, looking at your problem.
  
  Type names regionAve$Zoo
  
  you should see that the name “data” appears.
  
  you used regionAve$Zoo$x … which doesnt exist
  regionAve$Zoo$data is the data.
  
  for example type
  
  >regionAve$Zoo$x
  
  So, post the code that you wrote and I’ll show you how to fix it.
  
  ###regionAnomaly <- anomalize(regionAve$Zoo$x)###
  Error in anomalize(regionAve$Zoo$x) : TemperatureZoo must be a zoo object
  
  This means that Zoo$x is NOT a zoo object. In fact its missing if you used regionalAverage() because the right name is Zoo$data
  
  The second error happens because of the first error.
  
  So post your sessionInfo() and your code.
  
  Download the AirVentPost code.
  
  CSV. see the help manual. write.csv()
  
  Additional data:
  
  1. Point me at the source, additional data is easy to add.. usually
  2. non adjusted: wont add any stations.
  3. number of data points. What kind of area are you looking at.
  4. be aware that taminos method gives you a reference station, ( I need to change
  the name) you will have to translate it into anomalies. Roman's method will give
  you an average.. Couple days away. The have the same shape but there is sometime
  and offset.
Rob

July 22, 2011 at 10:23 PM

Reply

Hi Steven,
Thanks for the help. That was a bit of a stupid mistake on my part. I should of noticed that your new version was using the $data instead of $x like he used. I will have a look at the manual for finding the export to .csv portion.

Thanks for the tip on >sessioninfo()

Haven’t touched R in a long time so it is taking a bit of time to get used to things. Very good platform though I must admit. Much easier to use than python or matlab.

Regarding the additional data from environment canada. Here is a post by the folks at clear climate code on the subject:

http://clearclimatecode.org/canada/

And here is their analysis of it:

http://clearclimatecode.org/analysis-of-canada-data/

In terms of where to find the raw data to scrape I can’t figure out exactly where they were able to get it. For similar work i did in the past I manually downloaded from each separate location.

3. Number of data points. I’m looking at an area that is expected to be isolated. I am trying to replicate work I did previously that is being submitted for publication this week. Just trying to find an automated way of doing it because I manually did it originally. (i.e. manually downloaded for each site from env canada and ghcn and compared).

here is an image showing on the left the number of stations I got through manually downloading and on the right the number using this R package. Note the left is annual averages and the right is monthly.

4. I’m not sure what you mean by roman’s giving an average as compared to tamino’s giving a reference station to be honest.

The method I used for this paper was to average for each year if there was 11 months worth of data (12th month was interpolated if missing). Then I then simultaneously computed offsets for each station versus the all station average in the overlapping periods. The stations were then adjusted to using the offsets and this process was iterated until the absolute value of the sum of the offsets was below a threshold. My understanding was that Tamino does the same thing. I was not aware he was adjusting all to a reference station. I had thought that he was using the average of all as the reference station and then computing offsets between that all station average and the stations themselves to align them.
Steven Mosher

July 22, 2011 at 11:19 PM

Reply

‘In terms of where to find the raw data to scrape I can’t figure out exactly where they were able to get it. For similar work i did in the past I manually downloaded from each separate location.”

look at their scraper.

4. It’s rather hard to explain the differences between Tamino’s approach and Roman’s so I’m going to leave that for another day as I’m pressing hard with Roman’s approach and trying to wring some loops out of the control code. ( my mind is elsewhere ) The easiest way to explain it is to give you a dumb example. take 5 stations. Make 4 of them a constant figure of 1. Make the 5th 2C. Constant.

If you feed those 5 to tamino’s approach you will get a “reference” station that is 1. the 5th station will have an “offset” of 2. That minimizes the offset ( remember the first station starts at an offset of 0 ) Roman’s method will give you the output as 1.2C.. and every station will have offsets from that mean. As long as you are working in anomalies there is no difference between these two. So if you want to combine stations at the same (very close) location then Tamino’s approach is better than the Giss approach to combining stations at the same location. ( I didnt read his description carefully enough) It will also work for anomaly calculations. If you are interested in an
“average” temperature over a wide area. Then I’d probably use Roman’s THIS ALL NEEDS TO BE TESTED against some synthetic data and you have to understand the purpose of each function. So, I have some updates I have to do to the manual, but I want to make sure I understand both methods fully.
Rob

July 22, 2011 at 11:52 PM

Reply

Thanks Steven,
I guess my previous approach was the same as RomanM’s then. Funny how things work out. Way back in the spring I tried to use Tamino’s approach but practically I found it too difficult to do so I adapted his to do like RomanM’s apparently without noticing it.

Either way. I have looked at the scraper and they essentially use a scraper to take from wikiscrape where they have set up an env canada scraper. It looks like the env canada scraper isn’t working from what I can see on the following link:

http://scraperwiki.com/scrapers/canada-climate-data/

Their code on the scraper itself is here:

http://scraperwiki.com/scrapers/canada-climate-data/edit/

and it points to the following:
http://www.climate.weatheroffice.gc.ca/climateData/bulkdata_e.html?timeframe=3&Prov=XX&StationID=%s&Year=2010&Month=1&Day=1&format=csv&type=mly
for the data they received.

Thanks and hopefully i’m not keeping you too busy
- Steven Mosher
  
  July 22, 2011 at 11:56 PM
  
  Reply
  
  It may be a while till I can look at that.
Rob

July 23, 2011 at 12:12 AM

Reply

Completely understandable. I’m up in the Arctic conducting field work for a month starting soon anyways. It is a pity that the scraper does not work. If I do get it to work there is another program that converts the scraped data to the same format as the ghcnV2 data.

You end up with a separate file which includes the data. Any thoughts on how easy (or difficult) it would be to actually incorporate that data if I had it (I actually do have a downloaded set of data which was done in Nov (2010). The download is available here:

http://code.google.com/p/ccc-gistemp/downloads/detail?name=ccc-gistemp-input-20101029-ca.zip&can=2&q=
- Steven Mosher
  
  July 23, 2011 at 1:48 AM
  
  Reply
  
  Incorporating new data is relatively easy. I’ll look at doing a better scraper, since I know R better than Python
  
  rather than putting the data in ghcn V2 format. which is a bitch. I would do this.
  just do stations by column. The ghcnv2/v3 format is a bit of a pain.
  
  I just broke the code on Roman’s stuff so I’m gunna do a post on it.
  
  Pity that I decided to do stations in columns. Makes the interface code a bitch. transposes, name copies, loops instead of aggregate.. yuk.. anyways, its working.
  Roman’s method with spatial averaging. cool
  
  Steve
Rob

July 24, 2011 at 12:32 AM

Reply

Yeah I saw the post there. Looks interesting. What other option could you have done instead of doing the stations in columns? I mean that to me makes the most sense for this type of work?

It would be great to have that scraper in R. Very useful for the future. Since the data that I mentioned before (in the zip file above) is probably in a format similar to V2 it is prolly less useful than maybe I thought. I know I got the analysis working using the CCC method incorporating the additional station data a while back but that analysis is certainly not as straightforward as the way you’ve got your package prepared. What they’ve done over there is great but it isn’t terribly user friendly. Especially for things like finding out the number of stations being used for a given month and so on.

I was actually a little surprised how it turned out when I ran the analysis… i.e. how few stations are being used in the ghcn for the region I studied for the recent period. I had about 15-20 being used in my analysis to be submitted whereas GHCN is using 4… The station drop off probably does not significantly affect the trend much but it is a little annoying how much data is lost. Much of Northern Canada is like that. No wonder that Hadley performs so poorly in the north, not only do they use GHCN but also they have the baseline to consider too. I actually have a graph comparing FDM, CAM and BMM (what I call Bias Minimization Method (similar to Roman M’s)) for my region i’ll show it when I get my data nearby again.
- Steven Mosher
  
  July 24, 2011 at 11:52 AM
  
  Reply
  
  Two alternatives
  
  1. Stations in rows: Much better for R when you use aggregate. But At the time zoo was column based and romans stuff was column based.
  
  2. A 3D matrix like Nick stokes uses.
  
  The GISS (CCC) methodology seems a bit weird now that I’ve finished up romans and taminos, crus and have nicks on the way.
  
  The reference station method of GISS would benefit from tamino’s work since the decisions they make are ad hoc and untested. With Tamino’s approach or romans approach your using all the information in an optimal way.
  
  Lemme take a quick look at the scraping issue. Nick delivered working code so I cant screw that up too terribly..
dfsdfg

September 26, 2011 at 2:43 PM

Reply

The film’s launch has proved something of a double-whammy on this front. Our new research has shown that the range of an electric car can be increased by an average of 20% if the driver changes their driving technique