Search Results

Keyword: ‘Nightlights’

Giss Nightlights Replicated

October 18, 2010 6 comments

UPDATE: holy open source to the rescue. I posted a question yesterday on a idea peter had. Transaprency for overlaying light maps onto google maps. reminds me of the old Quake days. Well, I know John Carmack, John is an old friend of mine, but I’m no John Carmack. Neither am I   Paul Murrell. He whipped out some code, took my sample files and produced this beauty. Expect a post once I figure out how the hell he did that! light map, (blacker is brighter) dang i feel stupid.

And you see that little patch of light at around 40.908, -4.1138

here’s what is there

Pretty good huh. Except, the station is in NAVACERRADA. which is just a little south of the “station location” Not far, just a couple decimal points away. In a peri urban/urban location that happens to be close to that 40 nightlights area, urban. Errors in station location matter. At least to the categorization…

A while back Ron Broberg and Peter O’Neill started down the path of understanding and replicating NASA’s use of nightlights. Ron’s latest effort is here. Peter’s work can be found here. My work has relied heavily on their insights and some of their trials and tribulations. I’ve had my own share of  great insights smashed upon looking at the code and data one last time and finally I’ve been able to replicate what GISS have done. This post will be long with a lot of code and charts. It assumes a familiarity with the Nightlights file and how NASA uses it.

In Ron’s last post he was trying to use “raster” to read in the “nightlights” file and he remarked that he could not match GISS results. Peter has also noted some irregularities. Ron notes that his results seemed shifted a cell in one direction or another. To me that seemed like a boundary problem so I spent time chasing down the boundary conditions ( when a station lands on a boundary) That was wrong. After a visitor noted some ‘registration’ problems and after Robert had a look at the file the source of the problem was clear: when the nightlights file was turned into a GeoTiff the metadata was not written correctly. the extents and pixel size were off by fractions. This does not impact GISS. Looking at the file they posted it was clear they were not using a *tif file, but rather a *.dat file prepared for them by Imhoff. I’ve written Imhoff to alert him to the issue with the *tif. In the meantime , with robert’s help I’ve hacked my way around the bad metadata and the results are a very close match with GISS.  To the code:

extent(hiResLights) = c(-180,180,-90,90)
Inv <- loadOBJ(getwd(),GISSNIGHT.RDATA)

First we load the “world_ave.tif” into a raster. This is a 30 arcsecond file with nightlights every 1km. Recall these are derived from 2.7km pixels so the 1km resolution is really false. Next, we force an extent command onto the raster. Raster reads it extent from the file’s metadata. In this case the files metadata indicated the wrong extent. It left off a tiny sliver of the world. The resolution was consequently figured as .00833 degrees. That little imprecision was enough to cause big differences between GISS results and raster results. With that fixed, we proceed to load the Giss inventory. In previous programs I have modified this to hold information about the entire cell surrounding the station. The reason for that is the station location uncertainty. Next we will create some variables to categorize nightlights according to Imhoffs schema. We use ‘cut’ to recode the DN numbers into his categories, and we use “cut” to turn DN into GISS categories of “urban” “periurban” and “rural.” The we create a bar plot to compare NASA’s lights value from their inventory to our independent raster version. Finally we append the new variables to out inventory data frame.

urbanType <- cut(Inv$DSMP,c(-1,10,20,35,49,81,91,254,255),
NasaLights <-cut(Inv$Nightlights,c(-1,10,35,255),right=T,
MoshLights <-cut(Inv$DSMP,c(-1,10,35,255),right=T,labels=c("Rural","Periurban","Urban"))
fname <-fullPath(dir,"Stationtypes.png")

As the chart indicates there is a very small difference between NASA’s categorization and our categorization. Still, there is a small difference, a handful of stations. And so we investigate the ‘spread’ between NASA nightlight values and the one we generate:

spread <- Inv$Nightlights-Inv$DSMP

fname <-fullPath(dir,"Spread.png")
hist(spread,breaks=50,main="Difference Nightlights-DMSP",col="blue",sub=max(spread,na.rm=TRUE))

The differences are minor and range from -67 to 31. This indicates that when each program does its lookup there are still some areas where we look up adjacent cells, but that difference does not result in changes to categorization. A brief check of the extrema in this case. First we look at the biggest positive difference and then the biggest negative difference. On second though I should probably turn this into an absolute difference.

spread <- Inv$Nightlights-Inv$DSMP
dex <- which(spread==max(spread,na.rm=T))
e <-extent(c(Inv$LonMin[dex],Inv$LonMax[dex],Inv$LatMin[dex],Inv$LatMax[dex]))
r <-crop(hiResLights,e)
M1 <- lightsGooglemap(extent(r),Inv[dex,],dir)

Looking directly in the center you can see the ‘S’. And now for the biggest negative difference

dex <- which(spread==min(spread,na.rm=T))
e <-extent(c(Inv$LonMin[dex],Inv$LonMax[dex],Inv$LatMin[dex],Inv$LatMax[dex]))
r <-crop(hiResLights,e)
M2 <- lightsGooglemap(extent(r),Inv[dex,],dir)

This is harder to make out but at the center of this city we have, apparently, two cells close together that differ by 67. Lets look at the entire inventory for that location

Id:  22223074000 ;  Name: DUDINKA ;  Lat  69.4 ;Lon 86.17 ; Altitude 19 ; GridEl 50

Rural S ;  Population 20 (20,000)

Topography FL  ;Vegetation NA ; Coastal: no ; DistanceToCoast NA ; Airport  FALSE ;  DistanceToTown  NA


Light_Code C ( this is the deprecated GHCN light code)

US_LightCode NA ( this is the Giss 1,2,3 code: deprecated )

Nightlights  65 ( this is the value NASA sees when they look up the station location)

CountryIndex  86  CountryName   RUSSIAN FEDERATION (ASIAN SECTOR)   Cell 106779141 ( the cell id in raster)

DSMP  153 ( the value raster sees )

LonMin  84.75; LonMax 87.58333 ;LatMin 68.9; LatMax 69.9  ( the extent of the contour plot)

MinLights  0 ;MaxLights 166  ;MeanLights 0.7348775;  AreaLights 12353.27 (sqkm); SumLights 29983

the minimum, max, and mean lights within the extent. also the area of the extent and sum of all lights

PeriUrban 3km  Urban 3km:  how close  is the nearest periurban and urban light

MoshLights:Urban  NasaLights: Urban  UrbanCode: Urban4

The codes for urbanity, including the more detailed Imhoff code

PopDensity 3031.  the population density implied by the nightlights 3031 people sq/km

And then the google earth:

And wikipedia

Which indicates that the population figure in the  GISS inventory is not accurate. Next we look at the frequency of different types of lights we get at the stations:

fname <-fullPath(dir,"DSMPHist.jpg")
hist(Inv$DSMP,breaks=25,main="Frequency of Nightlights at Station location", xlab="Radiance number",col="blue",xlim=c(0,max(Inv$DSMP,na.rm=T)),ylim=c(0,5000))

And then we will look through all the stations and create two categories. Stations where the surrounding area ( a 55km radius) is “rural” and those cells that have Periurban or urban lights in the vincinity

AllRural <- ifelse(Inv$MaxLights<11,T,F)
fname <-fullPath(dir,"RuralCells.jpg")
barplot(height=c(sum(AllRural==T),sum(AllRural==F)),main="Frequency of stations in Rural Cells",names.arg=c("Rural Cells","Mixed Cells"),col="blue")

And  then we select rural stations that have some peri/urban lights in the vicinity and we see how bight those nearby Lights are

MixedCells <-Inv[which(Inv$DSMP < 11),]
MixedCells <- MixedCells[which(MixedCells$MaxLights > 10),]
fname <-fullPath(dir,"MixedCells.jpg")
hist(MixedCells$MaxLights,breaks=25,main="Frequency of Maxlights in Cell of Dark Stations", xlab="Radiance number",col="blue",xlim=c(0,max(MixedCells$MaxLights,na.rm=T)),ylim=c(0,1000),labels=TRUE)

Then we pull out the following. We look for the rural station that has the brightest nighlight within a 55km radius. And we plot that stations lightmap. I also write the small raster to disk. This is for the next project which is to test putting overlays onto google map! ( R list is helping so they need code and data!)

And the google Map.. The bright lights are madrid.

And the code for that is below

StationNumber <- which(MixedCells$MaxLights==max(MixedCells$MaxLights,na.rm=T))

e <-extent(c(MixedCells$LonMin[StationNumber],

r <-crop(hiResLights,e)
M3 <- lightsGooglemap(extent(r),MixedCells[StationNumber,],dir)
fname <-fullPath(dir,"test.tif")
fname <-fullPath(dir,"test.grd")

Next we will select the Nasa stations that qualify under GISS criteria for Rural. And here we will look at the distance for the closest Periurban pixel. That is, the station has a value of 01-10. And we ask the question  How many rural stations have a peri urban pixel within 3km, within 5km, within 10km, 20km, 30km, 40km. Essentially if the station location is in error, even by a little, will changing the location put the station in a peri urban place. In short, I searched around every station and looked for bright pixels.

NasaRural <- Inv[which(Inv$NasaLights=="Rural"),]

fname <-fullPath(dir,"PeriUrbanDist.jpg")


hist(NasaRural$PeriUrban,main="Distance to Closest PeriUrban",col="blue",labels=TRUE,breaks=15,xlab="Distance(km)",xlim=c(0,50))

Simply, if you look at Rural stations (around 3800)  1100 of those stations will have a peri urban pixel within 3km. hence, the accuracy of the station lat and lon will matter. 1400 of the stations have periurban within 5km and half are with 5km of a peri urban pixel. 5km is roughly .05 degrees. hence, accuracy in lat/lon matters. And in our survey we found stations that were mislocated by more than a full degree. Next, we look for nearby Urban pixels

fname <-fullPath(dir,"UrbanDist.jpg")
hist(NasaRural$Urban,main="Distance to Closest Urban",col="blue",labels=TRUE,breaks=15,xlab="Distance(km)",xlim=c(0,50))

Again, we take the rural stations and count how many stations have an urban pixel within 3km. That would be 10. How many have an urban pixel within 5km? 41, within 10km? 154. Location accuracy matters. Next, I noted that NASA have added a test looking at “pitch black” stations. That is, stations with light =0. What do these look like. First a count

PitchDark <- ifelse(Inv$DSMP==0,T,F)
fname <-fullPath(dir,"Pitchdark.jpg")
main="Frequency of Pitch dark stations in Giss Inv",
names.arg=c("Pitch dark Stations","Some Light "),col="blue")

The issue, however, is not with the actual value at the station (0-10) the issue is station location accuracy. What is needed is accurate location data. Sorting for pitch dark does no good if the location is wrong. To understand that we will start to look at the pixel surrounding a pitch black station.  Because of the location errors we start by screening at the CELL. Here we know this: we know that for these stations there are no lit pixels within 55km. So, inthese cases location error is not a factor. The station could be anywhere in this 1 degree cell and it would still be pitch dark. This is a conservative screen which would minimize misidentification of peri urban or urban sites as rural. Again, better location data and this screen becaomes less stringent

DarkCells <- ifelse(Inv$MaxLights==0,T,F)
fname <-fullPath(dir,"Darkcells.jpg")
main="Frequency of Pitch dark cells in Giss Inv",names.arg=c("Pitch dark Cells","Some Light "),col="blue")

So, there are roughly 1000 stations where the station is pitch dark and all the surrounding area ( 55km) is dark. And next we look at “dim” cells. This approach counts the stations that are in areas where the entire cell has DN numbers less than 11. All rural cells:

DimCells <- ifelse(Inv$MaxLights<11,T,F)
fname <-fullPath(dir,"Dimcells.jpg")
main="Frequency of Dim cells in Giss Inv",names.arg=c("Dim Cells","Some Light "),col="blue")

As one might expect the number increases a bit. There are roughly 1200 stations where the  station and the surrounding area ( radius 55km) are all rural by Imhoff’s designation. So location error doesnt play here as well. Next we look, at pitch dark stations. And here we asses the pixels in the vicinity

fname <-fullPath(dir,"PeriPitchdark.jpg")
hist(Inv$PeriUrban[PitchDark],col="blue",labels=TRUE,breaks=20,xlab="Distance(km)",main="Occurance of peri urban by pitch dark stations",xlim=c(0,50))

The appraoch here is to look at pitch dark stations and then see what is within 3km, 5km etc.  Of all the pitch dark stations that NASA could select in its pitch dark test, 477 of them had periurban pixels within 3km, 224 stations had peri urban pixels within 5km. if the station location is off, then these stations would not be pitch dark.

Next we look at the occurance of urban pixels within these boundaries

fname <-fullPath(dir,"UrbanPitchdark.jpg")
hist(Inv$Urban[PitchDark],col="blue",labels=TRUE,breaks=20,xlab="Distance(km)",main="Occurance of Urban by pitch dark stations",xlim=c(0,50))

So, if you look at all pitch dark stations, 6 of them have an urban pixel within 3km, 19 have an urban pixel within 5km. 68 have an urban pixel within 10km, 113 within 20km. and so forth. Finally, we look at 3 different kinds of sites. we count “pitch black cells, pitch black stations with rural lights for 55km, and finally rural cells. We’ve done these before but here we just put them on one chart

siteType <- vector(length=3)
pdInv <- Inv[PitchDark,]
siteType[1] <-nrow(pdInv[which(pdInv$MaxLights == 0),])
siteType[2] <-nrow(pdInv[which(pdInv$MaxLights <  11),])
siteType[3] <-nrow(Inv[which(Inv$MaxLights <  11 ),])
fname <-fullPath(dir,"RuralcellType.jpg")
barplot(height=siteType,main="1deg cell types",col="blue",names.arg=c("Black Cells","Dark Station/rural cell","Rural Cells"))

Any of these screens would diminish the location  error problems. Finally, I take the data from Imhoff and I contstruct a field for population density in sq km. This gets added to the inventory. Theres more work to do here. Documenting pitch black stations with urban pixels nearby, fusion tables, google tours, overlays on google earth. Fun times ahead

PopDensity <- vector(length=nrow(Inv))

for(i in 1:nrow(Inv)){










PopDensity <- PopDensity*100


fname <-fullPath(dir,"GissInvEnhanced.inv")


Categories: Uncategorized

Accuracy in Inventories and Nightlights Part II

October 17, 2010 1 comment

In part 1 we discussed UHI in a general way and introduced NASA’s use of nightlights to identify Rural, periurban and Urban stations. Very simply, the latitude and longitude data in the station inventory is used to look up at radiance value in the nightlights file. If that value is  10 or less, the site is Rural. If it is 11-35, the site is Periurban, If the site has lights greater than 35 it is Urban. These values are then used in NASA’s adjustment process. In that process, periurban sites and Urban sites are adjusted to comport with their Rural neighbors.

Logically then we will want to know several things. First how accurate is the station latitude and longitude data, second how accurate is the nighlights data, third how accurate is the look up, and finally do nighlights really “pick out” stations that dont suffer from UHI. Or rather, can you have UHI in a lightless place?

The accuracy of the latitude and longitude data is hard to determine. Basically, we want to ask is the station in the real world at the same location that the database says it is. Well, if we knew exactly where the real stations were independently of the inventory data, then we could just compare the two sources. In some cases this is possible. Let’s start with stations in the US.

In the US station location is now recorded down to 4 decimal places, eg  43.6754. using a rule of thumb that says a degree is roughly 111 km ( at the equator) we can see that 1/10 is roughly 10km, 1/100 is roughly 1km,  and so the US stations are represented down to roughly 100 meter precision. But is it accurate?   That question can be addressed by looking at alternative  sources. In this case, In that project many sites were visited and volunteers took GPS readings at the site. A complete comparision of the field measurements and the data held by NCDC has not  been made public. So, one can only do manual spot checks.

In any case the first thing one can realize about the Giss inventory is that they appear not to use the more precise GHCN data. In their inventory for the US they continue to use the data down to 1/100 of a degree precision.  Or roughly 1km. Given the latitude of the US, however, one can say the precision is slightly better than this. Still, with more precise data available one wonders why not use it. Particularly because if a station location happens to fall on a grid line, the rounding differences between machines can cause one system to look up one cell, and another system to look up the adjoining cell. This is not an issue, if the adjoining cells are all roughly the same, however, why introduce this kind of uncertainty when you have better data.

Instead of focusing on the US stations, I will examine the ROW. Primarly because Hansen2010 has extended the use of nightlights beyond the US and because the accuracy problems are more easily illustrated there. In order to illustarte the accuracy problem in the ROW we will utilize Google map. In short, we will look up the station location used by GISS and then see if there is station there. That phenomena is clearest in places like coastlines, and deserts. It clear because at coastal locations you will see the google push pin in the water. As noted peter has done much of the work documenting this and notifying NASA, who apparently think it doesnt matter. It’s also clear in deserts because you can see a push pin located out in the middle of a pathless field usually with a small town nearby. At airports its clear when we have other supporting evidence, such as ground based photos of the station which we can then place using google maps.

A few minutes with the Inventory will demonstrate that something is wrong with NOAA’s inventory, or wrong with Google Earth. Since coastal locations show the problem most clearly, I took the Inventory data and read it into a fusion table and then filtered the data to look for coastal locations within 1 miles of the coast. Pulling that into Google earth one can tour the stations. Using the ruler tool one can see that location errors in the 1/100ths of a degree take stations and put them into the sea. Sometime the error is much larger: Consider this station, off by 2 degrees

For download you can find a google earth tour some of  the suspect coastal station locations in my drop down box.

or visit google fusion table I created for this.

The bottomline is this. The station latitudes and longitudes are precise to 1/100th but inaccurate up to 10’s of kilometers. I find no evidence to support Hansen’s contention that the station data is accurate to .01 degrees. That means when we look at the nightlights at say 34.55,45.23 we get the nightlights at that location, BUT the station may actually be at 34.876753423, 45.198763254. And that location may have different nightlights.

The problem actually gets worse than this because even with the exact same coordinates and the exact same nightlights data, I can get a different value when I do my lookup than when NASA does theirs. One reason for this is rounding. If I look up a site location that falls exactly on a boundary between two cells, I have to choose which cell to pull data from. Even with the same package, raster, on different machines i could get different results. To see how large this problem is, I repeated an experiment that Ron Broberg did a while ago. Using the GISS inventory and the same nightlights that NASA uses checked the values they got for nightlights with the values produced by using raster.

Categories: Uncategorized

Nightlights: First Principles

October 15, 2010 Leave a comment

Update: see part II here

With the publication of Hansen2010 forthcoming it is critical to examine the subject afresh. The global temperature index product from NASA is known as GISSTEMP .GISSTEMP, like the temperature index from Hadley/CRU and NCDC attempts to estimate the average temperature of the globe using historical data archived in the GHCN ( Global Climate Historical Network) as well as some other other sources, namely SCAR and USHCN. Moshtemp also compiles an average using  GHCN data and a method derived from CRU.

One issue that generates a fair amount of controversy is the issue of UHI or the Urban Heat Island. As climate science tells us Urban areas tend to be warmer than non urban areas. The physical reasons behind this are related to the geometry of urban areas, the material properties of urban areas, and human activity. Briefly, the geometry of the urban landscape, buildings, influence the amount of sun that strikes the surface, the clear view of the sky at night ( radiation back to the sky) and it alters the airflow over the surface which impacts the  transport of heat skyward. The geometry can also lessen the UHI, for example, if we consider aspect such as tree canopies which cool the urban environment. According to that study the second most important regressor after canopy cover was building height. Canopy cover created “cool parks” while building height was instrumental in creating hot zones. The importance of building height is well established in the literature for some time

And the relationship here not simply a statistical correlation but rather has a physical explanation.Buildings change the flow of air  and they change the way heat is transported
“Air traversing rough warm urban areas thus encounters changes in surface and
atmospheric characteristics that alter existing mean and turbulent wind velocity fields
(Craig 2002).  In particular, UHIs cause pressure field deformations, buildings produce
barrier effects, and large urban surface roughness length values increase surface frictional
drag.  Bornstein and Johnson (1977) showed that wind speeds over NYC were decreased
below (increased above) observed rural values during high (low) regional-speed periods.
Loose and Bornstein (1977), and Gaffen and Bornstein (1988) also showed that NYC re-
tards synoptic fronts as they pass over the city during non-UHI periods, while such fronts
were accelerated over the city during periods of well-developed UHIs; similar effects on
sea breeze movement over NYC were found by Bornstein and Thompson (1981).
Urban areas also alter wind directions, e.g., in otherwise calm synoptic flows, UHIs
produce inward directed “country breezes” that result in speed maxima and confluence
over the city.  With non-zero synoptic flows, the UHI is advected downwind and urban- induced flows are superimposed onto prevailing winds, the combination of which results
in displacement of urban effects to the downwind urban edge.
During non-UHI periods, building-barrier induced diffluence at the upwind urban
edge divides regional flows as they approach a city, especially during stable nocturnal
conditions (Bornstein and LeRoy 1990).  The diffluence is followed by downstream cyc-
lonic-turning on the right-hand urban edge (looking downstream) and anticyclonic turn-
ing on the left hand urban edge (Bornstein et al. 1993).  Barriers also produce confluence
at both lateral urban edges (where air deflected around the city converges with the undis-
turbed prevailing flow) and downwind of the city (where the flow re-unites).  These hori-
zontal convergence/divergence effects produce urban vertical velocity patterns consistent
with 3-D mass consistency and are larger during unstable daytime hours than stable
nighttime hours. “


My intention here is not to explain the phenomena of UHI, suffice it to say that the existence of UHi is generally accepted and studied in detail. The effect is not uniform as this crosssection of Budapest indicates:

The effect can be modulated by wind speed, cloudiness, and precipitation. In addition, Oke suggests that the effect is modulated by the wetness of the surrounding rural environment. Lastly, the effect is  a function of the number of people. In general, more people means more buildings, more water taken from surrounding rural areas, more economic activity, more surfaces that retain more heat. Consequently, some view population as a good proxy for the actual physical differences that cause an urban environment to be warmer. However,

“The use of urban population as a predictor of the amount of UHI bias has been studied (Oke 1973; Oke 1982; Karl et al. 1988). Oke (1973) examined relationships between the urban and rural temperature bias and population statistics of North American and European cities and identified two different relationships for the two continents. Other limitations to the use of population statistics as an estimator of the UHI bias include the lack of globally consistent statistics. Additionally, population data given for a geographically arbitrary boundary can be difficult to relate to the population in the immediate vicinity of a weather station. Thus, population alone does not appear to be a globally applicable method for evaluating and removing the UHI bias.”

(The Use of NOAA AVHRR Data for Assessment of the Urban Heat Island Effect, Gallo, Karl, et al)

Nevertheless, Gisstemp uses  a proxy of population, nightlights, to categorize temperature stations into catagories of urban and rural throughout the world. Until recently Gisstemp had just used Nightlights for the US stations and population for the ROW. That has changed, they now use nightlights for the entire globe

The Nightlights data are used as a proxy for population and population is used as a proxy for the actual physical changes that cause UHI. For present purposes we will accept this notion and our question will be instead “how accurate are nightlights and how well can they separate the urban from the rural. This is simply a data accuracy issue and an uncertainty issue. Namely, when GISS use Nightlights to classify a station as “lit” is that determination error free? and if there are errors how widespread are they and finally what if anything do these errors mean in the determination of a global temperature.

For the purposes of this exercise we will accept the data that NASA relies on and we will accept their classification scheme. The accuracy of that radiance data is not examined here. Further, NASA defines the following categories: a radiance value of 0-10 is defined as rural,11-35 as periurban, and above 35 is defined as urban. In Giss code, rural stations are used to adjust periurban and urban stations. That adjustment will not be discussed. The very narrow question we will address is this:  Given a station at latitude X and longitude Y, how accurate is the giss assessment that a station is rural or not. The first part of this question is how accurate are the station locations. Giss station locations are found in the v2.inv file, which is derived from a similar file at GHCN. There are over 7000 stations in that file with and each has an associated altitude and longitude,as seen below in an example from greenland. the station at GODTHAB NUUK

That station is listed as being at Lat: 64.17 Lon: -51.75. In actuality the site is here 64.191852,-51.675487. The error here appears small. .02 degrees in latitude and .08 degrees in longitude.

Taking a larger view we can see that the location error is on the order of several km. Roughly, at the equator a degree is 111km and so .1 degrees will be on the order of 10km ( this varies with latitude). A cursory look at the stations in this way quickly reveals that the errors can be as large as .5degrees.  Nasa have been advised or these errors and they appear to be ignoring them. As Peter details here in his exchange with Jim hansen. Hansen’s claim that the station data is accurate to within .01 degrees is clearly wrong as 5 minutes with the station data and google earth will attest. This station in the inventory is a good case.  For more cases see Peter’s post above

This story is repeated over the ROW. In the USA, however, the station data is reported down to four decimal degrees. While there are still inaccuracies with that data, I’ll focus of the ROW.

So lets set the stage by stating the problem. The nightlights data is collected so that every pixel represents 2.7km of the earths surface. This data is then projected at 1km. Generally you can think of that as 1/100th of a degree. The station location data on the other hand is far less accurate, with errors up to .5deg or more. Simply, a station that says it is at 0.0,0.0 could actually be at .5,0. Now if nightlights are uniform over big patches, we have less of a problem. If every pixel with .5 degrees of a station is lit with the same light, then station location accuracy is not that critical. But if brite pixels and dark pixels are intertwined then the location aliasing is more critical.

In the posts that follow I will examine this location question in detail. The code for examining every station in detail over the whole globe is done, and we will start to plow through the data.

To the code:

The data sources. the station data and the nightlights data.

url_radianceCalibrated <- ""
url_Giss_Inv <-""



download.Nightlights <- function(dir=DOWNLOADS_DIR,url=url_radianceCalibrated){
dzip <-file.path(DOWNLOADS_DIR,basename(url),fsep=.Platform$file.sep)

fname <- file.path(DOWNLOADS_DIR,basename(url),fsep=.Platform$file.sep)
gzname <- file.path(DOWNLOADS_DIR,NightlightsGz,fsep=.Platform$file.sep)
tifFile <-file.path(DOWNLOADS_DIR,hiResNightlights,fsep=.Platform$file.sep)
file.copy(tifFile, hiResNightlights )

Categories: Uncategorized

Nightlights, Contours, and Rgooglemap

October 14, 2010 2 comments

I am continuing the investigation of nightlights using some additional packages from Cran. Here we add Rgooglemaps to the mix. Rgooglemaps is a neat tool that gives you a simple ( needs better docs) interface to the static map server. Perhaps, I’ll modify the code to my likeing, so For now I use it as delivered. Lets recap the issue. Station positions ( Lat lon) from GHCN inventory are not precise. Consequently, the nightlights you read may not be accurate. To show that I’ve collected stations where the Lat/Lon of the station is Dark, but where the surrounding area ( radius of about 55km) contains urban stations.  First off some basic stats:

Next, we look for those stations that have no urban lights within a 55km radius. Rural cells, as I term it:

And then we look at the mixed cells. Cells where the station lat/lon is dark, but urban lights are in the “hood” Recall that “rural is defined as a DN ( radiance number) of less than 10. What we see is that within the “mixed” cells we have situations were the rural location is close to urban locations. And the question is “how close” and is the station really at the location in the inventory or is it really in a close by urban site.

At this stage the work is exploratory, getting tools together and refining an analysis approach. One issue is the registration of nightlights. More on that later, lets just look at some charts: A nightlights contour, “S” marks the station spot.

The map above is drawn by accessing the static map server. And I can Zoom
Categories: Uncategorized

Nightlights in China

October 11, 2010 Leave a comment


Some are not aware that GISS has switched to using Nightlights in the ROW. According to their updates they have moved to nightlights for the ROW.

The station inventories can be found here

The station I examine below is listed  like this in the new giss inventory

20551495001 TURPAN                          42.92   91.00   24  384R   -9MVDEno-9x-9HOT DESERT  
    A    0

That final Zero is Giss’ field for nighlights. For this station our figures match. My preliminary analysis shows no station at that location, however there is a nearby settlement that may be its location.

As most who follow the construction of a Global temperature index know the Index created by GISS uses Nightlights as a determinate of whether a station is urban or rural. To determine if a site is rural GISS apparently looks at the Night lights value and if the value is less than 10, the site is declared as rural. There is an inherent problem with this that results from the location errors in the GHCN inventory. Simply, the Nightlights image is collected  at a 30 arcsecond accuracy while the GHCN inventory is far less accurate. A GHCN site that is supposed to be at 42.0 90, may actually be at 42.3,90. The exact distribution of errors has not been quantified, but others looking at the issue have reported errors up to .5 degrees. At the equator nightlights data is roughly 1km data. .5degrees at the equator is roughly 55km. That means when GISS looks at nighlights they could actually be looking at the lights from a position 55km away from the actual site. They have not taken aliasing into account to INSURE that a station is in fact rural.

In an effort to come to some understanding of the magnitude of this problem I’ve built several tools. The first was a program to download and analyze Nightlights. Using the calibrated dataset referred to by GISS, I downloaded and worked with the HiRes version. My approach was simple and brute force. For every station in the GHCN inventory I cropped the surrounding area to approximately a 111km square with the GHCN station in the center.  It turns out after I wrote the code that raster has a command to do exactly what I wanted [ see xyValues(..buffer=)

The resulting sub raster was then processed to extract the following: min value, max value, mean value, and value at the location indicated by GHCN. These rasters can then be plotted. In the abstract we are concerned about urban sites being mislocated as rural sites. That is, a site in a city having a recorded location that is outside the city. Such a site would show up dark, but actually be in a lit area. The goal then would be to develop an approach that would allow for the rapid assessment of this possibility. More on that later.

First, we will start with an approach that eliminates that possibility. We do that by constraining the stations to those that have no lights brighter than 10 in the whole sub grid. Starting with 7280 stations if we select those stations that have a Nighlights value of less than 10, and if we require that no pixel within ~55Km of that location is also less than 10, we have a good assurance that the site is dark regardless of the location errors, provided the location error is less than ~.5degrees. Filtered thusly there are  1099 such stations in GHCN. My Data is all open and viewable in google fusion tables. Fusion tables will allow you to subset the data any way you want and geo view the results.

Rural Area by nighlights

As you can note that means the other 6000 sites are all in areas where there is some sign of urban development within ~55kmkm or so of the location. To screen for the phenomena I’m asserting exists, I filtered the data to look for sites where the  Area had “high frequency” that is, values of 0 and values over 100. The reporting site has a value of Zero according to its location in the inventory AND the surrounding area has values over 100. There are 179 entries meeting these condition. That view  is located here

<iframe width=”500px” height=”300px” scrolling=”no”  src=”’0’+and+col4%3E%3E1+%3E%3D+’100’&h=false&lat=-1.0546279422758869&lng=177.1875&z=2&t=2&l=col1%3E%3E0″></iframe&gt;

Looking at that map I was drawn to the site in China. For a couple reasons that should be obvious. Pulling up the inventory on that station we see the following: Please note the fields my analysis adds. ( again with the mountain valley sites!)

Id: 20551495001
Lat: 42.92
Lon: 91
Altitude: 2.4
GridEl: 384
Rural: R
Population: NA
Topography: MV
Vegetation: DE
Coastal: no
DistanceToCoast: NA
Airport: FALSE
DistanceToTown: NA
Light_Code: A
CountryName: CHINA
MinLights: 0
MaxLights: 160
MeanLights: 0.908130081
AreaLights: 12392.31018
SumLights: 17872

“AreaLights” is the area in square km that I perform the cropping at, roughly 111km per side. The putative station location is centered and I crop a equal area square around every station center. The figure of 111km is selected based on reports of errors as large as .5 degrees in some station locations. SumLights is the sum of all radiance values in the sub grid. DSMP is the figure at the GHCN location which matches the inventory value in GISS.

Using the lat lon from GHCN we an get the google map. with the green arrow indicating where GHCN ( And GISS) thinks the station is:

You can zoom in on the location at the green arrow. See any station?

Here is what this world look like to nightlights. the blue circle represents the location given by GHCN. its dark. Yet, a few km away we find an urban location. At approximately 43.08,90.45 we see the hot spot

And here is the GE view of the hotspot

So what do we know. We know that Nightlights is reported at a 1Km resolution. When know that GHCN has a coarser resolution. This leads to aliasing. When we read Nightlights  given the locations in GHCN we have no assurance that the lights figure we obtain is the correct one. If we try to protect against this by filtering stations more aggressively, we are left with 1099 stations. That is 1099 stations that have no light in their area. On the other hand, there are 179 sites where a bright urban source is close by. By chance I picked good site to illustrate the issue, or maybe that station is somewhere in that pathless desert. My bet is the location and the assesement of this site as “dark” is wrong. Since this is classed as an “rural” station it recieves no adjustment but would adjust other stations in its sphere of influence: here is the Giss chart of the station:

The solution is for NOAA to update its inventory with accurate location data. This is tedious work, there is a much better way to fix these simple problems

If Anybody wants a good are to study I’ll suggest this collection of sites where the max lights are less than 10 for a nice tight collection of stations. But here too be aware of the  land use change.

And for a list of US sites, start with this map.  MaxLights in the area less than 10. consider the constellation of three sites in the same vincinity.  In particular “hanksville” which Giss sees as Dark, and which my program determined  has no lights within a 55km radius. In the chinese case we have a station mislocation, and below we see than even the requirement of no lights within 55km is not a perfect filter

42572470001 HANKSVILLE                      38.37 -110.72 1313 1358R   

On the ground:

Categories: Uncategorized

Nightlights II

October 9, 2010 Leave a comment

I’ve modified some routines so that we are always grabbing a roughly equal area regardless of the latitude. Basically, you do this:

getLonScaleFactor <- function(lat){

kmAtEq <-111.3195

kmAtLat <- 111.41288*cos(lat*DEGREES.TO.RADIANS)-.09350*cos(3*lat*DEGREES.TO.RADIANS)+0.00012*cos(5*lat*DEGREES.TO.RADIANS)



# the above function returns a scale factor for km per degree @ a given latitude

getExtent <-function(x,halfLength=.5,LonLat) {


yMin <- max(ymin(x),LonLat[2] - halfLength)

yMax <- min(ymax(x),LonLat[2] + halfLength)

xMin <- max(xmin(x),LonLat[1] - lonAdjust)

xMax <- min(xmax(x),LonLat[1] + lonAdjust)

e <- extent(xMin,xMax,yMin,yMax)




Simply we feed getExtent a “raster”, a  point, and a “halfLength”  The last parameter tells you how wide and tall your plot will be. So the default of .5 means that your “point” will be bracketed by 1/2 a degree of latitude  (+- 1/2 degree) and the longitude will be scaled depending upon the latitude. At the equator, this entails 1/2 degree in each direction. Toward the pole, I adjust the width to keep the areas generally similar. Note, the underlying data is not globally complete, there are small gaps at the pole. In a better implementation of “getExtent” I would wrap at the extrema. Then I played with colors. Arrg I hate doing color by number:

And by changing the call to “getExtent() I can request a grid that only looks out .25 degrees in latitude and longitude.

Categories: Uncategorized


October 8, 2010 6 comments

Those who follow the discussions about UHI understand that “nightlights” plays a large role in defining whether or not a station is considered Rural or Urban. In the work of GISS nightlights are determined by looking at the DSMP product. The product is available in 30 arcsecond format. That’s .00833 degrees. The following issue arises. The GHCN metadata is reported out at two resolutions: For the US,  the V3 inventory reports down to the 4th digit. X.yyyy. For the ROW the precision is X.yy. Further, we know that some of the locations are wrong. Herein lies the question. If you take the reading of nightlights from an aliased location or a faulty location, how good is your estimate?

I started out looking at this with raster. Of course I didnt read the manual entirely so I spent some time re inventing wheels. In the end I will illustrate two commands and how they help up start down the path to the answer:








coords <-getInvLonLat(Inv)

cellId  <- cellFromXY(hiResLights,coords)

Lightc  <- cellValues(hiResLights,cells=cellId)

Inv <- cbind(Inv,DSMP=Lightc)

Inv <- cbind(Inv,Cell=cellId)

Very quickly. We download the lights file, untar it and then use a utility I wrote to ‘gunzip” the entire directory. Next we load the raster. It’s huge. Nightlights at 1km. Don’t even bother plotting it, you’ll see clouds of little points. Next, I pull out the cellId for every station. Then using those cells, I pull the value of nightlights at that location. Then I just add them to the inventory. To answer the question about the potential errors we need to look at the neighborhood. I tinkered around with calls to “adjacency” and “focal” but in the end settled on the following. First, extent:

e <- extent(6,8, 36.0, 38)

For this command I merely picked the coordinates of the first station: 6.95,36.95. I then bracket the lat and lon. Then we call ‘”crop”

r <-crop(hiResLights,e)

then plot:


points(36.93 6.95,col=’red’)


later I’ll apply an SST mask, but this is what you see:

Categories: Uncategorized

Terrain effects on SUHI estimates

November 10, 2012 3 comments


Zhang and Imhoff (2010 pdf here utilized NLCD impervious surface area (ISA), Olson biomes, and MODIS Land Surface temperature (LST) to estimate the magnitude of UHI in large cities across the US.  Peng  employed a   similar approach in studying 419 large cities ( population greater than 1m ) around world. Peng’s work suggests a limit or threshold to the  Surface Urban Heat Island (SUHI) effect in that he found that city size and population was not an effective predictor of SUHI for very large cities. Imhoff’s study on somewhat smaller cities suggest there is a range at which city size is an effective predictor of SUHI

In addition, Zhang and Imhoff found that SUHI is related to the surrounding environment. A city that is developed in a forest biome, for example, will show a greater SHUI than one embedded in a desert.

Zhang and Imhoff’s curve is, as they note, somewhat similar to Oke’s postulated curve for the relationship between population and the Urban Heat Island  ( UHI  as measured by air temperature as opposed to surface temperature ) and it shares a certain weakness with Oke’s analysis with respect to the sample size at very small populations/small size.  My goal in this study is to enlarge the sample of smaller sized cities and towns and refine the estimate in much the same way that Peng focused his study on cities over 1M in population. My working hypothesis is that just as there is a  plateau of SUHI above a certain size that there may be a plataeu at smaller sizes, suggesting that a logistic relationship might be a better representation of the underlying process.


Sites were selected for study by utilizing MODIS ( MOD12Q1 2005) (15 arc second) Landcover data. The 17 class landcover product was reprojected from its native format into a WGS84 GeoTiff using nearest neighbor filtering and with a 15 arc second ( 500 meter) output format. The landcover data was reclassifed into urban and non urban pixels and a cluster analysis was performed to create patches of continguous urban pixels.  Various size patches, from 10 to 100sq km,  of urban areas were then selected for further processing. In addition, Census gazetter information was used to select locations of even smaller size from .5sqkm to 10 sq km.  A 1km Digital Elevation model derived from 90m SRTM data was utilized to compute elevations, slopes and aspects. Finally, 8 day 1km Day LST  and Night LST from MODIS was used. For this pilot study one 8 day period from July 4th 2006 was utilized ( MOD11A2 v 5 ) . The LST product was reprojected from its native format into a WGS 84 Geotiff using nearest neighbor filtering and a 30 arc second (~1km) output format.


Census gazetter information was used to select a sample of small towns and CDPs that had areas less than 10 sq km and no substantial water area within the areas boundaries. That list was culled further using NLCD 2006 impervious area and Nightlights derived ISA to ensure that the small towns were at least 10km from larger urban areas.  Urban areas larger than 10 sqkm were selected by created an urban/ rural mask from MODIS landcover.  The mask was processed by finding patches of urban areas using a queens case clustering approach. The urban areas were sorting according to size and then classified according to their isolation from other urban areas. The goal of this processing was to identify urban areas that were surrounded predominately by rural pixels. A 20 km boundary zone was established around every urban patch and to qualify for selection 99% of the pixels outside the urban zone had to be classified as rural by MODIS landcover. This selection process resulted in a database of 99 sites. Given what I’ve learned in this process I will probably loosen the restriction regarding “satellite” urban pixels surrounding the urban core and include more sites in the final analysis.

The selected sites where then processed as follows. For every location a distance map was created for a 100km radius around the site center. The distance map provided the distance that every rural pixel was from the closest urban pixel. In the end, analysis showed that the distance from an urban pixel was not a significant driver in the land surface temperature of the rural pixel.  With regards to Surface Air Temperatur (SAT) and  UHI measures, it is likely that distance from the nearest urban pixel would be significant due to horizontal advection, but with Land Surface Temperture and SUHI, the physical ability of heated land surface in the urban area to influence the LST in adjacent rural cells is apparently limited. In the final analysis, nevertheless, I will probably add a screen to remove rural pixels that are within 2km of a urban pixel.

After distance maps are created out to 100km, the elevation data is processed within the same radius. In addition to retrieving the elevation for each cell the slope and aspect of the cell is generated. Zhang and Imhoff (2010) utilized elevation as screen for selecting both urban and rural pixels. Their screen required that a pixel be with a +-50 meter window of the mean urban pixel elevation. As the results below show elevation does have an impact on LST so that before comparing urban to rural pixels some form of normalization must be used. In addition to elevation, however, the slope and aspect of a grid cell can influence the LST.  In examining differences between the ASTER LST product and the MODIS LST product Liu (2009) pointed to terrain effects such as slope as part of the reason for differences between these products. The difference in slopes between urban pixels and rural pixels is evident and generally speaking urban pixels tend to have lower slopes. The difference in slope, of course, will impact the amount of direct solar radiation that a given cell “sees” and the interaction between the slope and the view angle of the sensor apparently impacts the emissions sensed. The slope effect is small but statistically significant. For this analysis no screen was employed, however a screen of less than 1 degree slope will eliminate only 15% of urban pixels and 20 percent of rural pixels.  A screen at .5 degrees eliminates 35% of all urban pixels and 40% of all rural pixels.

After terrain processing the  Day LST and Night LST is retrieved for every pixel within 20km of the site center. As this radius is less than the 100km radius used for distance map and terrain processing we are assured of having good values for both the distance map and the slope and aspect maps at the 20km boundary. The cell centers of the LST product are then used to retrieve the corresponding distance map values, landcover values, and terrain values.


Zhang and Imhoff(2010) approached the problem estimating  SUHI by classifying the urban pixels using ISA and the rural pixels using Olson Biomes. The approach I take is slightly different. The temperature product MODIS LST which we both use is a product that estimates the LST based on other MODIS products. Most importantly the split window algorithm uses estimates of emmisivity that are derived from Modis landcover product. Such that if MODIS Landcover believes the pixel is urban or croplands it does not matter what the ISA product thinks the pixel is and it doesnt matter how Olsen classified the pixel. Modis Landcover is used in the processing chain to to determine the LST and is the more appropriate classification tool. What we also see here is that Urban/Rural differences are to some extent built into the results. If Modis Landcover classifies a pixel as urban that classification will drive the emissivity estimate which in turn drives the output of the split window alogorithm employed to determine LST.

A box plot of Daytime LST by landcover type shows the range of temperature differences across the various land classes. It should be noted that this sample has not been screened for elevation, slope, or latitude and includes all pixels processed for the 99 sites

The x axis represents the IGBP code for land cover class, but its instructive to ask if the urban class is identifiable on inspection. If you think that class 7 is urban you are wrong. class 7 is open shrublands.  Class 10 is grasslands. Urban is class 13.  Below is the nighttime LST


What both figures illustrate is that the differential between Land surface temperature in the urban setting and Land surface temperature in the non urban setting is not single number. The implication for UHI studies is clear. The difference in temperature between a urban area and a rural area depends as much upon the rural setting as it does upon the urban setting. It is possible to select areas, such as grasslands and open shrubland that are both rural and warmer than the urban setting. It is also possible to amplify the difference by selecting other classes such as wetlands ( class 11 ) which are at the cooler end of rural landscapes. And its possible to get confusing answers by selecting croplands or mixed croplands ( class 12 and 14 ) which have high variability where some exhibit warmer LST and some exhibit cooler LST than urban areas.

At this stage with only 8 days of temperature there is not enough data to draw any conclusions. So the next step will be to download an entire year of LST daily data to construct seasonal averages for these sites. One thing we can do as a sort of sanity check is some simple regression on the entire dataset.  Below, we can see that  40% of the variation is explained by looking at the urban area ( which lumps all rural pixels together with and area of zero ), elevation, latitude, and slope.

lm(formula = Sites$Day ~ Sites$UrbanArea + Sites$Elevation +
Sites$y + Sites$Slope)

Min 1Q Median 3Q Max
-34.419 -3.099 -0.543 2.947 20.985

Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.196e+02 8.395e-02 3806.62 <2e-16 ***
Sites$UrbanArea 5.640e-02 2.046e-03 27.57 <2e-16 ***
Sites$Elevation 7.864e-03 1.941e-05 405.24 <2e-16 ***
Sites$y -4.809e-01 2.140e-03 -224.73 <2e-16 ***
Sites$Slope -7.718e-01 5.962e-03 -129.44 <2e-16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.202 on 272367 degrees of freedom
(12251 observations deleted due to missingness)
Multiple R-squared: 0.4448, Adjusted R-squared: 0.4448
F-statistic: 5.456e+04 on 4 and 272367 DF, p-value: < 2.2e-16

> model <- lm(Sites$Night~Sites$UrbanArea+Sites$Elevation+Sites$y+Sites$Slope)
> summary(model)

lm(formula = Sites$Night ~ Sites$UrbanArea + Sites$Elevation +
Sites$y + Sites$Slope)

Min 1Q Median 3Q Max
-23.2246 -1.3937 0.0242 1.5774 7.7279

Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.080e+02 3.670e-02 8391.708 < 2e-16 ***
Sites$UrbanArea 2.634e-02 8.862e-04 29.724 < 2e-16 ***
Sites$Elevation -1.542e-03 8.662e-06 -178.006 < 2e-16 ***
Sites$y -4.316e-01 9.388e-04 -459.731 < 2e-16 ***
Sites$Slope -1.695e-02 2.667e-03 -6.355 2.08e-10 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.324 on 281416 degrees of freedom
(3202 observations deleted due to missingness)
Multiple R-squared: 0.483, Adjusted R-squared: 0.483
F-statistic: 6.573e+04 on 4 and 281416 DF, p-value: < 2.2e-16


What we see here is that Urban Area has a positive coefficient of roughly .056 for daytime LST and a nighttime coefficient of roughly .026. This approach is different than the approach used by Zhang and Imhoff in that rather than screening for the same elevation, I’ve accounted for the elevation and slope via regression. Of course, I’ll check it both ways.  Finally,  The estimate for small town UHI provided by Imhoff was on the order of 1.75C for a town of size 1 sq km. In their study they found that UHI was fit by  3.48*log(area) + 1.75. That particular study had only two datapoints for towns at 1km. Looking at the data for the twons with urban areas less than 2km, the figure of 1.75C for Day SUHI can’t be supported. The difference in Day LST  at that size is effectively zero.



Now that I have the processing chains down to the last few bits I need to push forward on several bits.

1. More small area sites.  The restriction I imposed, that 99% of all surounding pixels in a 20 km radius limits the number of small sites that can be found. This restriction was motivated by a concern that distance from an urban pixel might influence the LST.  With a suitable buffer around each site I should be able to process more than the 30 or so small sites I have. Also, elevation screening and slope screening also reduces the number of sites that are usable in the end, so I’ll need to start with more sites.

2. Increase the sampling out to 50 sq km. Preliminary analysis indicated that there may be a break in SHUI around the 10 sq km mark. Although I had limited data ( 8 days in july when the signal should be high ) The UHI signal appeared to show up more clearly at around 10 sq km, So I should probably focus not only on the sub 10 sq km region, but out to 50.  Beyond 50 sq km, my results were pretty much in line with Imhoff, despite the differences in our methods.

3. Eliminate water pixels and wetlands. I believe Imhoff eliminates water pixels. There are a bunch of ways that the data can be cut as one can see from the land class figures.

4. Regress on SUHI directly. This entails processing each site separately. Constructing an average for the small town ( which can be 1-10 pixels) and then construct a rural average across all land classes or some reduced set of land classes. In this case one would want to screen for altitude and slope and perhaps regress with respect to latitude to see of the there is a latitude component. One difficult of course here is that you are averging over a small number of urban pixels and then comapring that to an average over a large number of rural pixels. Kinda messy.

Categories: Uncategorized

Pilot Study: Small Town Land Surface Temperature

October 11, 2012 3 comments


Zhang and Imhoff (2010)  pdf here utilized NLCD impervious surface area (ISA), Olson biomes, and MODIS Land Surface temperature (LST) to estimate the magnitude of UHI in large cities across the US.  Peng  employed a   similar approach in studying 419 large cities ( population greater than 1m ) around world. Peng’s work suggests a limit or threshold to the  Surface Urban Heat Island (SUHI) effect in that he found that city size and population was not an effective predictor of SUHI for very large cities. Imhoff’s study on somewhat smaller cities suggest there is a range at which city size is an effective predictor of SUHI

In addition, Zhang and Imhoff found that SUHI is related to the surrounding environment. A city that is developed in a forest biome, for example, will show a greater SHUI than one embedded in a desert.

Zhang and Imhoff’s curve is, as they note, somewhat similar to Oke’s postulated curve for the relationship between population and the Urban Heat Island  ( UHI  as measured by air temperature as opposed to surface temperature ) and it shares a certain weakness with Oke’s analysis with respect to the sample size at very small populations/small size.  My goal in this study is to enlarge the sample of smaller sized cities and towns and refine the estimate in mch the same way that Peng focused his study on cities over 1M in population. My working hypothesis is that just as there is a  plateau of SUHI above a certain size that there may be a plataeu at smaller sizes, suggesting that a logistic relationship might be a better representation of the underlying process.




Data from a wide variety of sources was used to identify a set of candidate small cities and towns. The 2010 US census gazetter of places in the US was used to identify town and census designated places (CDP )  with land areas less than 10 sq km. NLCD 2006 ( ISA) 30 meter data and Nightlights derived ISA  (1km data ) was used to narrow the list to small towns that are relatively isolated from larger urban areas. For the final processing Landcover provided by MODIS ( MOD12Q1 2005) (500 meter data ) was used to classify urban versus non urban pixels. As noted by Imhoff a cut off at 25% ISA by NLCD standard  is consistent with Modis classification of urban pixels. A 1km Digital Elevation model derived from 90m SRTM data was utilized to compute elevations, slopes and aspects. Finally, 8 day 1km Day LST  and Night LST from MODIS was used. For this pilot study one 8 day period from July 4th 2006 was utilized ( MOD11A2 v 5 )


Census gazetter information was used to select a sample of small towns and CDPs that had areas less than 10 sq km and no substantial water area within the areas boundaries. That list was culled further using NLCD 2006 impervious area and Nightlights derived ISA to ensure that the small towns were at least 10km from larger urban areas. As  Zhang/Imhoff Olson biomes were examined within .5 degrees of the site to ensure that the area existed in a singular biome and not an area that contained more than one biome. The final sample contained 169 sites spread out over the US as depicted below

Individual datasets were then built for every site. MOD12Q1, MODIS landcover was collected for a ~60 km radius around each site. The landcover data was used to create several masks. An urban pixel mask, a rural pixel mask, and a water pixel mask. The urban pixel mask and the water pixel mask were used to create distance maps: Distance from urban pixel maps and distance from water maps. The elevation data was processed to create a slope map and aspect map for each 60km grid. Finally MODIS11A2 LST data was downloaded and reprojected to a geographic projection ( WGS84) at 1km resolution in GeoTiff format. Data “bricks” in Rs package “raster” were created for each site. Each brick consisted of a urban mask, rural mask, water mask, distance maps, terrain maps and temperature maps.

Some fun charts

The purpose of the pilot study was to assemble the data and start to build the analysis code base for a final production run. During that process some of my ideas and approaches to the problem have defininately changed. The first change was the decision to use MOD12Q1. The reason is clear. Looking at the processing chain for MOD11A2 ( LST ) it is clear that this data product relies on MOD12Q1 for its estimate of emmissivity. To put it simply if NLCD thinks the pixel is Urban or impervious and MOD12Q1 thinks the pixel is trees, then MOD11A2 will assume an emissivity of trees to calculate the final LST. The other thing that changed was my decision to include more terrain data than Zhang Imhoff. In their approach they masked off pixels that were 50meter higher or lower than the average in the urban area. In my approach I will look at using that terrain information to explain the variance we see. Finally, I will probably change my method of looking at the 50km radius around each site. In part this was done to make the calculations of distance maps less intense. I Will probably just go ahead and calculate distance maps for the whole US, a process that might take a couple days at 500meter resolution. That will allow me to sample a wider geographic variety of sites than I currently have which tends to be concentrated in the middle of the country and tends to over sample certain areas with many small isolated towns.

Now for some charts.  First we see a sample chart from a given “brick” of data:


And next we see the Day LST and Night LST




And for grins we look at the following:  The ralationship between Temperature and Elevation for day and night,  and below that we have box plots of the temperature versus the land cover types in the RURAL pixels. That is one other difference between Zhang/Imhoff and my work. Where they looked at the “biome” which is really a historical context I will look at the current land cover. The reason is simple: MODIS LST is dependendent opon MODIS landcover in its determination of emissivity and finally temperature.



You should note that I have not looked at the Urban rural difference yet. At this stage in the pilot study I don’t want to be looking at any of those results. I just want to get the processing chain bug free and the main routines ironed out. There will be a little more code to pull out the “area” of the small towns but that is pretty trivial. Next step now I think will be finding ways to speed up the calculation of the distance maps.




Categories: Uncategorized

rasterVis to the rescue

September 26, 2012 8 comments

Programmers like Oscar Perpiñán Lamigueiro are the reason I love R!  Oscar is the maintainer of the rasterVis package and it in this post I’ll explain why it is must have package for anyone working with the raster package in R.  My latest project is focused on the NOAA’s Climate Reference Network. The details can be found here.  Over a year ago I wrote a package call CRN designed to download and organize the data from the Climate Reference Network and I wanted to update and put some finishing touches on that package. In the course of that I decided it would be a cool idea to build a detailed metadata package for the stations.

The CRN is a rather like a gold standard of climate data collection. Triple redundant sensors, 5 minute data, hourly data, daily and monthly. The sites have all been selected using rigorous standards and as the system has matured the data availability rates are close to 100%. Here are just of few of the projects I have in mind.

1. Doing a proper estimate of the US temperature using just the CRN stations. One person has already  done this incorrectly ( or sub optimally ) so, I’m planning on approaching the problem with Kriging. For this I won’t use the Berkeley Earth code because I want to do something in R, using gstat perhaps.

2. Looking at LST  around the CRN sites. There is also a really cool new package in R called Modis and I really want to do an interesting project to test that package out. The goal here would be to fetch the correct Modis tiles for Land Surface Temperature, and then look at the Air temperatures, soil temperatures ( CRN collects those ) 

3. Landcover study. I’ve been reading a bunch of papers by Marc Imhoff on  SUHI and the relationship between SUHI and impervious surfaces, tree cover, etc. and Marc’s team has done some really cool work on the relationship between SUHI and the particular biome that a city is embedded in. In some forthcoming work they also look at the effects of city size and city shape on SUHI. A a few points in their papers they have hinted at SUHI effects for even small urban developments with areas around 1 sq km. I’d like to understand that better.

 To pull off all these projects I will need some rather detailed metadata, more detailed than I’ve ever worked with before. For starters  I needed a station list for the CRN and USRCRN stations. There are any number of ways of getting this data, but I drop a note to Matt Menne of NOAA and he directed me to Michael Palecki who was kind enough to send me a csv file of stations. The thing I was looking for was the most accurate Latitude/Longitude data I could find, and Michael supplied it. 

Next task was collecting metadata for the stations and assembling a variety of tables. I won’t document that work here, but just to give you an idea of what I’m putting together For every station I have the following :  1) elevation,slope and aspect  from 30 meter DEM.  2) distance from the sea coast, distance from the nearest body of water, and percentage of the “land” that is covered in water over a broad range of radii from the site. 3  various landcover datasets. 4. Impervious Area. 5. Biome. 6 Anthrom. 7 Populations from various sources and at various levels of accuarcy. 8 Housing. 9. Nightlights ( stable and radiance calibrated ). 10 economic activity.  and I plan to get LST and probably Modis Albedo.

So where does Oscar and rasterVis come in?  Specifically in this case Oscar came to the rescue for a plotting problem I was having with landcover data.

Using the  2006 NLCD 30 meter land cover dataset the first thing I do is save off  “patches”  of landcover for  roughly 1 degree around the site. The 30 meter dataset is huge and things go much easier if I just create  small data sets for each station.

In code that looks like this


LcName  <-  “G:\\NLCD\\nlcd2006_landcover_4-20-11_se5.img”
LC           <-  raster(LcName)

Inv <- read.table(“”,stringsAsFactors =FALSE)

coordinates(Inv) <- c(“Lon”,”Lat”)
proj4string(Inv)   <- CRS(“+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0” )
Inv <-  spTransform(Inv, CRS(projection(LC)))


for(station in 1:nrow(Inv)){
     Lon <- coordinates(Inv)[station,”Lon”]
     Lat <- coordinates(Inv)[station,”Lat”]
     if(Lon >= xmin(LC) & Lon <= xmax(LC) & Lat >= ymin(LC) & Lat <= ymax(LC)){
         e <- extent(Lon-110000,Lon+110000,Lat-110000,Lat+110000)
         patch <- crop(LC,e)
         fname <-file.path(“NLCD2006″,paste(Inv$Id[station],”NLCD06.grd”,sep=””),fsep=.Platform$file.sep)
         writeRaster(patch,filename = fname)
}else {
    print(“Alaska or HI”)


Pretty simple. For 2006 Alaska and Hawaii are not in the dataset so we just skip them. The output of this is a small grid with the station in the center and 30 meter landcover data for 110,000 meters in all directions. That’s the data I will be working with.

The next step is to calculate a variety of statistics on this “patch” and smaller and smaller versions of the patch, so we get the land cover stats at various radii from the site, from 110km  down to 30 meters. Working with the 1 degree “patch” makes this work go very quickly.  raster has a function called “freq” which counts the frequency of values in the entire raster. [ As I write this I wonder if “freq” can be passed as  a function to the raster  “extract” function as a buffer option] 

In addition to creating 1 degree patches for every site I also create 1km patches which makes a nice size for display. That’s where rasterVis comes in.

Before Oscar solved my problem here is an example of what my plot looked like. Note this is at 500 meters



There are a bunch of problems here. First and foremost I could not get a consistent coloring for these plots. The issue is that plot() in raster was considering my data to be integers rather than factors and if land cover types were missing my colors scheme changed from plot to plot.

Oscar fixed all that by suggesting that I use rasterVis  and the  levelplot function.  His code dropped right in and after finding the color scheme in rgb for NLCD  the same plot now looks like this





I’ve got some work to do to switch back to the official legend descriptions but the maps now use the rgb from the standard and every map is consistently covered.  In the next installment I’ll post up Oscars code.


Here is what the Impervious surface data looks like ( Needs a grey scale of 0-100% )





To the naked eye it looks like this





Categories: Uncategorized