Archive

Author Archive

A quick look at GHCN version 4

November 3, 2018 4 comments

GHCN  version 4   beta is available.  Using the GHS  population dataset the ~27000  GHCNV4 sites were filtered  to collect only rural stations.   GHS  combines two datasets,  a  10meter  built surface  satellite dataset and a human population dataset.   https://ghsl.jrc.ec.europa.eu/.    using site locations the population within 10km  of the site was extracted.  two cases were considered:  A case where rual was defined as  less than 16 people per sq km, and a case where there were less than 7 people  per sq km, this filtering led to two subsets of the  ~27K  ghcnv4 stations. One with 15K stations and a second with 12K stations.   The temperature data used was the unadjusted  monthly T Avg.

 

 

 

 

 

Advertisements
Categories: Uncategorized

Modis QC Bits

December 5, 2012 46 comments

In the course of working through my MODIS  LST project and reviewing the steps that Imhoff and Zhang took as well has the data preparations other researchers have taken ( Neteler ) the issue of MODIS Quality control bits came up.  Every MODIS  HDF file comes with multiple SDS or multiple layers of data. For MOD11A1 there are layers for daytime LST, Night LST, the time of  collection, the emissivity from ch 31 and 32,  and two layer for Quality Bits.  The various layers of   MODIS11A1  file is available here. Or if you are running the MODIS package, you can issue the getSds()  call and pass it the name of a HDF file.

As Neteler notes the final LST files contain pixels of various quality. Pixels that are obscured by clouds are not produced, of course, but that does not entail that every pixel in the map is of the same quality. The SDS layers for QC  — QC_day and QC_Night– contain the QC bits for every lat/lon  in the LST map. On the surface this seem pretty straighforward, but deciding the bits can be a bit of a challenge. To do that I considered several source.

 1. Neteler’s web post

2. The Modis groups  user guide

3. The  QA Tutorial

4. LP  DAAC  Tools, specifically the LDope tools provided by guys at MODLAN

I will start with 4, the ldope tools, If you are a hard core developer I think these tools would be a great addition to your bag of tricks. I won’t go into a lot of detail, but the tools allow you to do a host of processing on HDF files using a command line type interface. They provide source code in c and at some point in the future I expect that I would want to either write an R interface to these various tools  or take the C code directly and wrap it in R. Since they have the code for reading HDF directly and quickly it might be a source to use to support the direct import of HDF  into raster. However, I found nothing to really help with understanding the issue of decoding the  QC bits. You can manipulate the QC layer and do a  count of the various flags  in the data set. Using the following command on all the tiles for the US I was able to poke around and figure out what flags are typically reported

comp_sds_values -sds=QC_Day C:\\Users\\steve\\Documents\\MODIS_ARC\\MODIS\\MOD11A1.005\\2005.07.01\\MOD11A1.A2005182.h08v05.005.2008040085443.hdf

However,  I could do the same thing simply by loading a version of the whole US into raster and using the freq() command on raster. Figuring about what values are in the QC file is easy, but we want to understand what they mean. Let’s begin with that list of values. By looking at the frequency count of the US raster and by looking at a large collection of HDf files I found the following values for the QC layer

values <-c(0,2,3,5,17,21,65,69,81,85,129,133,145,149,193)

The question is  what do they mean. The easiest one to understand is 0. A zero means that this pixel is highest quality. If we want only the highest quality pixels, then we can use the QC map, turn it into a logical mask and apply it to the  LST map such that we only keep pixels that  have 0 for their quality bits. There, job done!

Not so fast. We throw away many good pixels that way. To understand the QC bits lets start with the table provided

Table

If you are not clear on how  this works  every number, every integer in the QC map is an unsigned 8 bit integer. The range of numbers is 0 to 255. The integers represent bit codes which requires you to do some base 2 math. That is not the tricky part. The number 8 for example would be   “0001”  and 9 would be  1001.  If you are unclear about binary representations of integers I suppose google is your friend.

Given the binary representation of the integer value we are then in a position to understand what the quality bits  represent, but there is a minor complication which I’ll explain using an example.  Let’s take  the integer value of 65.  In R the way we turn this into a binary representation is by using the call intToBits.

intToBits(65)
[1] 01 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[25] 00 00 00 00 00 00 00 00

We only need 8 bits, so we should do the following

> intToBits(65)[1:8]
[1] 01 00 00 00 00 00 01 00

and for more clarity I will turn this into 0 and 1 like so

> as.integer(intToBits(65)[1:8])
[1] 1 0 0 0 0 0 1 0

So the first bit ( or bit 0 ) has a value of 1, and seventh bit  ( bit 6 ) has a value of 1.  We can check our math 2^6 =64  and 2^0 = 1, so it checks. Don’t forget that the 8 bits are numbered 0 to 7 and each represents a power of 2.

If you try to use this bit representation to understand the  QC “state”,  you will go horribly wrong!. It’s wrong because HDF files are written in big endian format. If you are not familiar with that, what you need to understand is that the least significant bit is on the right. In the example above the zero bit is on the left, so we need to flip the bits so that bit zero is on the right . Little Endian goes from right to left, 2^0 to 2^7. Big Endian goes left to right 2^7 to 2^0.

Little Endian:   1 0 0 0 0 0 1 0   = 65       2^0 + 2^6

Big Endian  0 1 0 0 0 0 0 1         = 65        2^6 + 2^0

The difference is this:

If we took  1 0 0 0 0 0 1 0 and ran it through the table, the table would indicate that  the first two bits 10  would mean the pixel was not generated, and  bit 1 in the 6th position would indicate an LST error of  <= 3K.  Now flip those bits

> f<-as.integer(intToBits(65)[1:8])
> f[8:1]
[1] 0 1 0 0 0 0 0 1

This bit order has the first two bits as 01 which indicates that the pixel is produced, but that other quality bits should be checked. Since bit 6 is 1 that indicates  the pixel has an error > 2Kelvin but less than 3K

In short,  to understand the QC bits we first turn the integer into a bit notation and then we flip the order of the bits and then use the table to figure out the QC status.  For MOD11A1 I wrote a quick little program to generate all the possible bits and then  add descriptive fields. I would probably do this for every Modis project I worked on especially since I dont work in binary everyday and I made about 100 mistakes trying to do this in my head.

QC_Data <- data.frame(Integer_Value = 0:255,
Bit7 = NA,
Bit6 = NA,
Bit5 = NA,
Bit4 = NA,
Bit3 = NA,
Bit2 = NA,
Bit1 = NA,
Bit0 = NA,
QA_word1 = NA,
QA_word2 = NA,
QA_word3 = NA,
QA_word4 = NA
)

for(i in QC_Data$Integer_Value){
AsInt <- as.integer(intToBits(i)[1:8])
QC_Data[i+1,2:9]<- AsInt[8:1]
}

QC_Data$QA_word1[QC_Data$Bit1 == 0 & QC_Data$Bit0==0] <- “LST GOOD”
QC_Data$QA_word1[QC_Data$Bit1 == 0 & QC_Data$Bit0==1] <- “LST Produced,Other Quality”
QC_Data$QA_word1[QC_Data$Bit1 == 1 & QC_Data$Bit0==0] <- “No Pixel,clouds”
QC_Data$QA_word1[QC_Data$Bit1 == 1 & QC_Data$Bit0==1] <- “No Pixel, Other QA”

QC_Data$QA_word2[QC_Data$Bit3 == 0 & QC_Data$Bit2==0] <- “Good Data”
QC_Data$QA_word2[QC_Data$Bit3 == 0 & QC_Data$Bit2==1] <- “Other Quality”
QC_Data$QA_word2[QC_Data$Bit3 == 1 & QC_Data$Bit2==0] <- “TBD”
QC_Data$QA_word2[QC_Data$Bit3 == 1 & QC_Data$Bit2==1] <- “TBD”

QC_Data$QA_word3[QC_Data$Bit5 == 0 & QC_Data$Bit4==0] <- “Emiss Error <= .01”
QC_Data$QA_word3[QC_Data$Bit5 == 0 & QC_Data$Bit4==1] <- “Emiss Err >.01 <=.02”
QC_Data$QA_word3[QC_Data$Bit5 == 1 & QC_Data$Bit4==0] <- “Emiss Err >.02 <=.04”
QC_Data$QA_word3[QC_Data$Bit5 == 1 & QC_Data$Bit4==1] <- “Emiss Err > .04”

QC_Data$QA_word4[QC_Data$Bit7 == 0 & QC_Data$Bit6==0] <- “LST Err <= 1”
QC_Data$QA_word4[QC_Data$Bit7 == 0 & QC_Data$Bit6==1] <- “LST Err > 2 LST Err <= 3”
QC_Data$QA_word4[QC_Data$Bit7 == 1 & QC_Data$Bit6==0] <- “LST Err > 1 LST Err <= 2”
QC_Data$QA_word4[QC_Data$Bit7 == 1 & QC_Data$Bit6==1] <- “LST Err > 4”

Which  looks like this

AllBits

Next,  I will remove those flags that don’t matter.  All good pixels, all pixels that don’t get drawn, and  those where the TBD bit is set high. What I want to do is select all those flags where the pixel is produced but the pxel quality is not the highest quality

FINAL <- QC_Data[QC_Data$Bit1 == 0 & QC_Data$Bit0 ==1 & QC_Data$Bit3 !=1,].  Below see a part of this table

Final2

And then I can select those  that occur in my HDF files.

BitsIN HDF

Looking at LST error which matters most to me, the table indicates that I can use pixels that have a  QC value of 0,5,17 and 21. I want to only select  pixels where LSt error is less than 1 K

Very quickly I’ll show how one uses raster functions to apply the QC bits to  LST.   using MODIS R and the function  getHdf() I’ve downloaded all the tiles for US for every day in July 2005. For July 1st, I’ve used  MRT to mosiac and resample  all the SDS. I use Nearest neighbor to insure that I dont average quality bits. The mosaic is the projected from a SIN projection to a Geographic projection ( WGS84) and a geotiff output format. Pixel size is set at   30 arc seconds

The SDS are read in using R’s raster

library(raster)

##  get every layer of data in the July1 directory

SDS <- list.files(choose.dir(),full.names=T)

##  for now I just work with the Day data

DayLst <- raster(SDS[7])
DayQC <- raster(SDS[11])

##  The fill value for LST is Zero. That means Zero represents a No Data pixel

##  So, we force those pixels to NA

DayLst[DayLst==0]<-NA
NightLst[NightLst==0]<-NA

##  Units in LST  need to be scaled to be put into Kelvin. The adjustment value is  .02.  See the docs for these values

##  every SDS has a different fill value and a different scaling value.
DayLst <- DayLst * .02

Now, we can plot the two raster

plot(DayLst, main = “July1 2005 Day”)

plot(DayQC, main=”Day Quality COntrol”)

USday

DauQCUs

And I note  that the area around texas has a nice variety of QC bits, so we can zoom in on that

gulf

qcgulf

Next we will use  the QC  layer and create mask

m <- DayQC
m[m > 21]<-NA
m[ m == 2 | m== 3]<-NA

Good <- mask(DayLst, m)

For this mask I’ve set all those grids with  QC bits  greater than 21 to NA.  I’ve also set  QC pixels with a value of 2 and 3 to NA.  This step isnt really necessary  since  QC  values of 2 and 3 mean that the pixel doesnt get produce, but I want to draw a map of the mask which will show which bits are NA and which bits are not NA.  The mask() function works like so. “m” is laid over DayLst.  bits that are NA in “m” will be NA in the destination (“Good”)  if a pixel is not NA in “m” and has a value like, 0 or 17 or 25, then the value of the source (DayLst) is written into the destination.

Here are the values in my mask

> freq(m)
value    count
[1,] 0        255179
[2,] 5         12
[3,] 17       8539
[4,] 21        3
[5,] NA   189147

So the mask will “block” 189147 pixels in DayLst from being copied from source to destination, while allowing the source pixels that have QC bits of 0,5,17,21 to be written. These QC bits  are those that represent an LST error of less than 1K

mask

When this mask is applied the colored pixels will be replaced by the values in the “source”

Good Lst

And to recall, these are all the pixels PRIOR to the application of the QC bits

gulf

As a side note, one thing I noticed when doing my SUHI analysis was that the LST data had a large variance over small areas. Even when I normalized for elevation and slope and land class there were  pixels that were 10+ Kelvin warmer than the means. Essentially 6 sigma cases. That, of course lead me to have a look at the QC bits.  Hopefully, this will improve some of the R^2 I was getting in my test regressions.

Categories: Uncategorized

Modis R: Package tutorial

November 25, 2012 13 comments

The MODIS package for R

For people who work in GIS in R there has been a bit of challenge in working with data from  Modis. I’ll use an example from my own work on UHI to illustrate. A while back I decided that I wanted to look at albedo data from MODIS.  If you want to get MODIS data there are basically two approaches. You can use  the web services  like REVERB which is basically a web interface that allows you to construct queries for a wide variety of data ( or similar services shown here ) or you can download files directly from the datapool ftp.

For my Albedo project I decided to download directly from ftp because I could write a simple program in R to collect the entries in the ftp directories and then download them over the course of a few hours.  It’s essentially the same process I use in my study of SUHI for small cities. Getting the data locally is not a big problem, except for the storage issue.  You can imagine what it would take to download 90 days of LST data for the US.  with 10-14 tiles of data required that ends up being a nice chunk of data.

The next challenge is getting MODIS data out of its native format into a format that is friend for R GIS packages. In the case of Albedo that data was stored in a format known as “HDF” or HDF-EOS.  I work with  the raster package and raster ( along with rgdal)  have no way of reading HDF directly. This meant I had to find a way to transform HDF into a more common format like “geoTiff”. There was also the issue of tiling multiple hdf files into a single raster. Unlike other datasets that might give you a monolithic file of the whole world, the MODIS collections are distributed as tiles using their tiling system.

To solve this problem there are a couple tools: MRT or the modis reprojection tool. or GDAL, the Geospatial Data abstraction Library. For Albedo I decided to use GDAL and I used the distribution of GDAL provided by FWTools.  With a few hundred albedo files I had to write a script to load every Hdf file and transform it to a ‘geotiff’ file. That worked fine, but now I had a set of R code for doing my statistics and a “bat” file for controlling Gdal.  With my SUHI project I decided to use MRT. In that case I used the GUI they provide to select all the files, mosaic them, crop them, resample and reproject. I quickly learned that duplicating the same steps with a GUI is not always easy. Further, if I am trying to create work that is reproduceable, it makes it hard. Plus I could not see using that tool to create 365 maps for every day in the year.

What’s needed is an R package that allows us to write R code and achieve the same results as we get with GDAL or MRT. We want to write R code that controls these external applications and tools. That’s what  MODIS R does.

The team: The MODIS R team consists of  Matteo Mattiuzzi, Jan Verbesselt, Forrest Stevens, Tomislav Hengl, Anja Klisch, Bradley Evans and Agustin Lobo. Some of the seminal work was done by Tomislav at his wonderful page

My Tutorial.  I had some discussions with Matteo who is the package maintainer and  suggested that I might do a tutorial to help people get started with the package.  At this stage the package is stable, so I decided to start a tutorial. The way I work is pretty simple. As I try  new things I write them up.    For now the tutorial  starts with the basics of R, R studio and preparing your system for MODIS R. That includes downloading a the tools and testing them out and finally downloading and building a WINDOWS version of MODIS R.

Then I will start to go through the API and test and explain all the calls in more depth than the manual and write some sample code.

A note. If you have questions about what I am doing go ahead and leave comments. I can collect up questions and those I cant answer I can pass on to the maintainers.

The tutorial starts here

Categories: Uncategorized

Terrain effects on SUHI estimates

November 10, 2012 2 comments

Introduction

Zhang and Imhoff (2010 pdf here utilized NLCD impervious surface area (ISA), Olson biomes, and MODIS Land Surface temperature (LST) to estimate the magnitude of UHI in large cities across the US.  Peng  employed a   similar approach in studying 419 large cities ( population greater than 1m ) around world. Peng’s work suggests a limit or threshold to the  Surface Urban Heat Island (SUHI) effect in that he found that city size and population was not an effective predictor of SUHI for very large cities. Imhoff’s study on somewhat smaller cities suggest there is a range at which city size is an effective predictor of SUHI

In addition, Zhang and Imhoff found that SUHI is related to the surrounding environment. A city that is developed in a forest biome, for example, will show a greater SHUI than one embedded in a desert.

Zhang and Imhoff’s curve is, as they note, somewhat similar to Oke’s postulated curve for the relationship between population and the Urban Heat Island  ( UHI  as measured by air temperature as opposed to surface temperature ) and it shares a certain weakness with Oke’s analysis with respect to the sample size at very small populations/small size.  My goal in this study is to enlarge the sample of smaller sized cities and towns and refine the estimate in much the same way that Peng focused his study on cities over 1M in population. My working hypothesis is that just as there is a  plateau of SUHI above a certain size that there may be a plataeu at smaller sizes, suggesting that a logistic relationship might be a better representation of the underlying process.

Data

Sites were selected for study by utilizing MODIS ( MOD12Q1 2005) (15 arc second) Landcover data. The 17 class landcover product was reprojected from its native format into a WGS84 GeoTiff using nearest neighbor filtering and with a 15 arc second ( 500 meter) output format. The landcover data was reclassifed into urban and non urban pixels and a cluster analysis was performed to create patches of continguous urban pixels.  Various size patches, from 10 to 100sq km,  of urban areas were then selected for further processing. In addition, Census gazetter information was used to select locations of even smaller size from .5sqkm to 10 sq km.  A 1km Digital Elevation model derived from 90m SRTM data was utilized to compute elevations, slopes and aspects. Finally, 8 day 1km Day LST  and Night LST from MODIS was used. For this pilot study one 8 day period from July 4th 2006 was utilized ( MOD11A2 v 5 ) . The LST product was reprojected from its native format into a WGS 84 Geotiff using nearest neighbor filtering and a 30 arc second (~1km) output format.

Methods

Census gazetter information was used to select a sample of small towns and CDPs that had areas less than 10 sq km and no substantial water area within the areas boundaries. That list was culled further using NLCD 2006 impervious area and Nightlights derived ISA to ensure that the small towns were at least 10km from larger urban areas.  Urban areas larger than 10 sqkm were selected by created an urban/ rural mask from MODIS landcover.  The mask was processed by finding patches of urban areas using a queens case clustering approach. The urban areas were sorting according to size and then classified according to their isolation from other urban areas. The goal of this processing was to identify urban areas that were surrounded predominately by rural pixels. A 20 km boundary zone was established around every urban patch and to qualify for selection 99% of the pixels outside the urban zone had to be classified as rural by MODIS landcover. This selection process resulted in a database of 99 sites. Given what I’ve learned in this process I will probably loosen the restriction regarding “satellite” urban pixels surrounding the urban core and include more sites in the final analysis.

The selected sites where then processed as follows. For every location a distance map was created for a 100km radius around the site center. The distance map provided the distance that every rural pixel was from the closest urban pixel. In the end, analysis showed that the distance from an urban pixel was not a significant driver in the land surface temperature of the rural pixel.  With regards to Surface Air Temperatur (SAT) and  UHI measures, it is likely that distance from the nearest urban pixel would be significant due to horizontal advection, but with Land Surface Temperture and SUHI, the physical ability of heated land surface in the urban area to influence the LST in adjacent rural cells is apparently limited. In the final analysis, nevertheless, I will probably add a screen to remove rural pixels that are within 2km of a urban pixel.

After distance maps are created out to 100km, the elevation data is processed within the same radius. In addition to retrieving the elevation for each cell the slope and aspect of the cell is generated. Zhang and Imhoff (2010) utilized elevation as screen for selecting both urban and rural pixels. Their screen required that a pixel be with a +-50 meter window of the mean urban pixel elevation. As the results below show elevation does have an impact on LST so that before comparing urban to rural pixels some form of normalization must be used. In addition to elevation, however, the slope and aspect of a grid cell can influence the LST.  In examining differences between the ASTER LST product and the MODIS LST product Liu (2009) pointed to terrain effects such as slope as part of the reason for differences between these products. The difference in slopes between urban pixels and rural pixels is evident and generally speaking urban pixels tend to have lower slopes. The difference in slope, of course, will impact the amount of direct solar radiation that a given cell “sees” and the interaction between the slope and the view angle of the sensor apparently impacts the emissions sensed. The slope effect is small but statistically significant. For this analysis no screen was employed, however a screen of less than 1 degree slope will eliminate only 15% of urban pixels and 20 percent of rural pixels.  A screen at .5 degrees eliminates 35% of all urban pixels and 40% of all rural pixels.

After terrain processing the  Day LST and Night LST is retrieved for every pixel within 20km of the site center. As this radius is less than the 100km radius used for distance map and terrain processing we are assured of having good values for both the distance map and the slope and aspect maps at the 20km boundary. The cell centers of the LST product are then used to retrieve the corresponding distance map values, landcover values, and terrain values.

Results

Zhang and Imhoff(2010) approached the problem estimating  SUHI by classifying the urban pixels using ISA and the rural pixels using Olson Biomes. The approach I take is slightly different. The temperature product MODIS LST which we both use is a product that estimates the LST based on other MODIS products. Most importantly the split window algorithm uses estimates of emmisivity that are derived from Modis landcover product. Such that if MODIS Landcover believes the pixel is urban or croplands it does not matter what the ISA product thinks the pixel is and it doesnt matter how Olsen classified the pixel. Modis Landcover is used in the processing chain to to determine the LST and is the more appropriate classification tool. What we also see here is that Urban/Rural differences are to some extent built into the results. If Modis Landcover classifies a pixel as urban that classification will drive the emissivity estimate which in turn drives the output of the split window alogorithm employed to determine LST.

A box plot of Daytime LST by landcover type shows the range of temperature differences across the various land classes. It should be noted that this sample has not been screened for elevation, slope, or latitude and includes all pixels processed for the 99 sites

The x axis represents the IGBP code for land cover class, but its instructive to ask if the urban class is identifiable on inspection. If you think that class 7 is urban you are wrong. class 7 is open shrublands.  Class 10 is grasslands. Urban is class 13.  Below is the nighttime LST

 

What both figures illustrate is that the differential between Land surface temperature in the urban setting and Land surface temperature in the non urban setting is not single number. The implication for UHI studies is clear. The difference in temperature between a urban area and a rural area depends as much upon the rural setting as it does upon the urban setting. It is possible to select areas, such as grasslands and open shrubland that are both rural and warmer than the urban setting. It is also possible to amplify the difference by selecting other classes such as wetlands ( class 11 ) which are at the cooler end of rural landscapes. And its possible to get confusing answers by selecting croplands or mixed croplands ( class 12 and 14 ) which have high variability where some exhibit warmer LST and some exhibit cooler LST than urban areas.

At this stage with only 8 days of temperature there is not enough data to draw any conclusions. So the next step will be to download an entire year of LST daily data to construct seasonal averages for these sites. One thing we can do as a sort of sanity check is some simple regression on the entire dataset.  Below, we can see that  40% of the variation is explained by looking at the urban area ( which lumps all rural pixels together with and area of zero ), elevation, latitude, and slope.

Call:
lm(formula = Sites$Day ~ Sites$UrbanArea + Sites$Elevation +
Sites$y + Sites$Slope)

Residuals:
Min 1Q Median 3Q Max
-34.419 -3.099 -0.543 2.947 20.985

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.196e+02 8.395e-02 3806.62 <2e-16 ***
Sites$UrbanArea 5.640e-02 2.046e-03 27.57 <2e-16 ***
Sites$Elevation 7.864e-03 1.941e-05 405.24 <2e-16 ***
Sites$y -4.809e-01 2.140e-03 -224.73 <2e-16 ***
Sites$Slope -7.718e-01 5.962e-03 -129.44 <2e-16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.202 on 272367 degrees of freedom
(12251 observations deleted due to missingness)
Multiple R-squared: 0.4448, Adjusted R-squared: 0.4448
F-statistic: 5.456e+04 on 4 and 272367 DF, p-value: < 2.2e-16

> model <- lm(Sites$Night~Sites$UrbanArea+Sites$Elevation+Sites$y+Sites$Slope)
> summary(model)

Call:
lm(formula = Sites$Night ~ Sites$UrbanArea + Sites$Elevation +
Sites$y + Sites$Slope)

Residuals:
Min 1Q Median 3Q Max
-23.2246 -1.3937 0.0242 1.5774 7.7279

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.080e+02 3.670e-02 8391.708 < 2e-16 ***
Sites$UrbanArea 2.634e-02 8.862e-04 29.724 < 2e-16 ***
Sites$Elevation -1.542e-03 8.662e-06 -178.006 < 2e-16 ***
Sites$y -4.316e-01 9.388e-04 -459.731 < 2e-16 ***
Sites$Slope -1.695e-02 2.667e-03 -6.355 2.08e-10 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.324 on 281416 degrees of freedom
(3202 observations deleted due to missingness)
Multiple R-squared: 0.483, Adjusted R-squared: 0.483
F-statistic: 6.573e+04 on 4 and 281416 DF, p-value: < 2.2e-16

 

What we see here is that Urban Area has a positive coefficient of roughly .056 for daytime LST and a nighttime coefficient of roughly .026. This approach is different than the approach used by Zhang and Imhoff in that rather than screening for the same elevation, I’ve accounted for the elevation and slope via regression. Of course, I’ll check it both ways.  Finally,  The estimate for small town UHI provided by Imhoff was on the order of 1.75C for a town of size 1 sq km. In their study they found that UHI was fit by  3.48*log(area) + 1.75. That particular study had only two datapoints for towns at 1km. Looking at the data for the twons with urban areas less than 2km, the figure of 1.75C for Day SUHI can’t be supported. The difference in Day LST  at that size is effectively zero.

 

Conclusion

Now that I have the processing chains down to the last few bits I need to push forward on several bits.

1. More small area sites.  The restriction I imposed, that 99% of all surounding pixels in a 20 km radius limits the number of small sites that can be found. This restriction was motivated by a concern that distance from an urban pixel might influence the LST.  With a suitable buffer around each site I should be able to process more than the 30 or so small sites I have. Also, elevation screening and slope screening also reduces the number of sites that are usable in the end, so I’ll need to start with more sites.

2. Increase the sampling out to 50 sq km. Preliminary analysis indicated that there may be a break in SHUI around the 10 sq km mark. Although I had limited data ( 8 days in july when the signal should be high ) The UHI signal appeared to show up more clearly at around 10 sq km, So I should probably focus not only on the sub 10 sq km region, but out to 50.  Beyond 50 sq km, my results were pretty much in line with Imhoff, despite the differences in our methods.

3. Eliminate water pixels and wetlands. I believe Imhoff eliminates water pixels. There are a bunch of ways that the data can be cut as one can see from the land class figures.

4. Regress on SUHI directly. This entails processing each site separately. Constructing an average for the small town ( which can be 1-10 pixels) and then construct a rural average across all land classes or some reduced set of land classes. In this case one would want to screen for altitude and slope and perhaps regress with respect to latitude to see of the there is a latitude component. One difficult of course here is that you are averging over a small number of urban pixels and then comapring that to an average over a large number of rural pixels. Kinda messy.

Categories: Uncategorized

Pilot Study: Small Town Land Surface Temperature

October 11, 2012 3 comments

Introduction

Zhang and Imhoff (2010)  pdf here utilized NLCD impervious surface area (ISA), Olson biomes, and MODIS Land Surface temperature (LST) to estimate the magnitude of UHI in large cities across the US.  Peng  employed a   similar approach in studying 419 large cities ( population greater than 1m ) around world. Peng’s work suggests a limit or threshold to the  Surface Urban Heat Island (SUHI) effect in that he found that city size and population was not an effective predictor of SUHI for very large cities. Imhoff’s study on somewhat smaller cities suggest there is a range at which city size is an effective predictor of SUHI

In addition, Zhang and Imhoff found that SUHI is related to the surrounding environment. A city that is developed in a forest biome, for example, will show a greater SHUI than one embedded in a desert.

Zhang and Imhoff’s curve is, as they note, somewhat similar to Oke’s postulated curve for the relationship between population and the Urban Heat Island  ( UHI  as measured by air temperature as opposed to surface temperature ) and it shares a certain weakness with Oke’s analysis with respect to the sample size at very small populations/small size.  My goal in this study is to enlarge the sample of smaller sized cities and towns and refine the estimate in mch the same way that Peng focused his study on cities over 1M in population. My working hypothesis is that just as there is a  plateau of SUHI above a certain size that there may be a plataeu at smaller sizes, suggesting that a logistic relationship might be a better representation of the underlying process.

 

 

Data

Data from a wide variety of sources was used to identify a set of candidate small cities and towns. The 2010 US census gazetter of places in the US was used to identify town and census designated places (CDP )  with land areas less than 10 sq km. NLCD 2006 ( ISA) 30 meter data and Nightlights derived ISA  (1km data ) was used to narrow the list to small towns that are relatively isolated from larger urban areas. For the final processing Landcover provided by MODIS ( MOD12Q1 2005) (500 meter data ) was used to classify urban versus non urban pixels. As noted by Imhoff a cut off at 25% ISA by NLCD standard  is consistent with Modis classification of urban pixels. A 1km Digital Elevation model derived from 90m SRTM data was utilized to compute elevations, slopes and aspects. Finally, 8 day 1km Day LST  and Night LST from MODIS was used. For this pilot study one 8 day period from July 4th 2006 was utilized ( MOD11A2 v 5 )

Methods

Census gazetter information was used to select a sample of small towns and CDPs that had areas less than 10 sq km and no substantial water area within the areas boundaries. That list was culled further using NLCD 2006 impervious area and Nightlights derived ISA to ensure that the small towns were at least 10km from larger urban areas. As  Zhang/Imhoff Olson biomes were examined within .5 degrees of the site to ensure that the area existed in a singular biome and not an area that contained more than one biome. The final sample contained 169 sites spread out over the US as depicted below

Individual datasets were then built for every site. MOD12Q1, MODIS landcover was collected for a ~60 km radius around each site. The landcover data was used to create several masks. An urban pixel mask, a rural pixel mask, and a water pixel mask. The urban pixel mask and the water pixel mask were used to create distance maps: Distance from urban pixel maps and distance from water maps. The elevation data was processed to create a slope map and aspect map for each 60km grid. Finally MODIS11A2 LST data was downloaded and reprojected to a geographic projection ( WGS84) at 1km resolution in GeoTiff format. Data “bricks” in Rs package “raster” were created for each site. Each brick consisted of a urban mask, rural mask, water mask, distance maps, terrain maps and temperature maps.

Some fun charts

The purpose of the pilot study was to assemble the data and start to build the analysis code base for a final production run. During that process some of my ideas and approaches to the problem have defininately changed. The first change was the decision to use MOD12Q1. The reason is clear. Looking at the processing chain for MOD11A2 ( LST ) it is clear that this data product relies on MOD12Q1 for its estimate of emmissivity. To put it simply if NLCD thinks the pixel is Urban or impervious and MOD12Q1 thinks the pixel is trees, then MOD11A2 will assume an emissivity of trees to calculate the final LST. The other thing that changed was my decision to include more terrain data than Zhang Imhoff. In their approach they masked off pixels that were 50meter higher or lower than the average in the urban area. In my approach I will look at using that terrain information to explain the variance we see. Finally, I will probably change my method of looking at the 50km radius around each site. In part this was done to make the calculations of distance maps less intense. I Will probably just go ahead and calculate distance maps for the whole US, a process that might take a couple days at 500meter resolution. That will allow me to sample a wider geographic variety of sites than I currently have which tends to be concentrated in the middle of the country and tends to over sample certain areas with many small isolated towns.

Now for some charts.  First we see a sample chart from a given “brick” of data:

 

And next we see the Day LST and Night LST

 

 

 

And for grins we look at the following:  The ralationship between Temperature and Elevation for day and night,  and below that we have box plots of the temperature versus the land cover types in the RURAL pixels. That is one other difference between Zhang/Imhoff and my work. Where they looked at the “biome” which is really a historical context I will look at the current land cover. The reason is simple: MODIS LST is dependendent opon MODIS landcover in its determination of emissivity and finally temperature.

 

Conclusion

You should note that I have not looked at the Urban rural difference yet. At this stage in the pilot study I don’t want to be looking at any of those results. I just want to get the processing chain bug free and the main routines ironed out. There will be a little more code to pull out the “area” of the small towns but that is pretty trivial. Next step now I think will be finding ways to speed up the calculation of the distance maps.

 

 

 

Categories: Uncategorized

Sample Input Data.

October 8, 2012 2 comments

Just a couple quick examples.

Starting with   30 meter impervious surface

Followed by MODIS Land cover  ( “red” is urban )

And finally Day  LST

 

Google earth

Categories: Uncategorized

More Fun With Modis

October 8, 2012 Leave a comment

I’ve started a tutorial on using the MODIS package in R the first few steps are here.  While I wait on a release of the package I thought I would play around a bit with the MRT tool and see how it worked.  Let’s recall where we are going.  I have an inventory of around 150 small cities and we are going to take a look at SUHI in those locations.  I have the DEM data, The Modis Land Class data, Olson Biomes, NLCD 30 meter land class and impervious surface. The last few bits I will need are MODIS LST and perhaps MODIS albedo ( If I get motivated) and EVI. We also have population data from a couple sources and housing data.

The first step to getting MODIS LST data is deciding what Tiles I need and then selecting dates. I had hoped that I might get away with using just a few tiles for the US, but that doesnt look like its going to happen. To figure out which Tiles I need there is a function in the MODIS package called getTile. But before I figured that out I tried to do it on my own. With some help from the geo sig list  I have the following to figure out the tiles.

First I get the  bounding coordinates for every tile

library(geosphere)

library(sp)

gring <-“http://landweb.nascom.nasa.gov/developers/sn_tiles/sn_gring_10deg.txt&#8221;

 
GRING <- read.table(gring, skip = 7, stringsAsFactors =FALSE, nrows=648,
na.strings=c(“-999.0000″,”-99.0000″))

colnames(GRING)<-c(“v”,”h”,”ll_lon”,”ll_lat”,
“ul_lon”,”ul_lat”,”ur_lon”,”ur_lat”,
“lr_lon”,”lr_lat”)

GRING <- GRING[!is.na(GRING[,3]),]

DF <- GRING[GRING$h < 15 & GRING$h > 6 & GRING$v < 7 & GRING$v > 3,]

The  table is downloaded and then I remove those rows that are NA. Next, I select those parts of the table that relate to CONUS  between  h(horizontal) 15 and 6 and vertical 7 and 3.

Next, I use Robert Hijmans code for generating a spatialpolygondataframe.

m <- as.matrix(DF)
z <- apply(m, 1, function(x) Polygon(matrix(x[c(3:10, 3:4)], ncol=2, byrow=T)))
n <- names(z)
y <- sapply(1:length(n), function(i) Polygons(z[i], n[i]))
sp <- SpatialPolygons(y, proj4string=CRS(‘+proj=longlat’))
spdf <- SpatialPolygonsDataFrame(sp, DF)

then we make some corrections for being on a sphere and we read in the lat/lon of all 150 sites
spdfc <- makePoly(spdf)

Inv <- read.table(“SiteInventory.inv”,stringsAsFactors = FALSE)

The next bit of clumsy code creates a spatial points data frame  that has 4 points for every site. I’m going to look +- .5degrees from the site center.

myExtent <- data.frame(lon=rep(NA,4*nrow(Inv)),lat=rep(NA,4*nrow(Inv)))

for( site in 1:nrow(Inv)){
dex <- ((site-1)*4)+1
myExtent$lon[dex]<- Inv$Lon[site] +.5
myExtent$lat[dex]<- Inv$Lat[site] -.5
myExtent$lon[dex+1]<- Inv$Lon[site] +.5
myExtent$lat[dex+1]<- Inv$Lat[site] +.5
myExtent$lon[dex+2]<- Inv$Lon[site] -.5
myExtent$lat[dex+2]<- Inv$Lat[site] -.5
myExtent$lon[dex+3]<- Inv$Lon[site] -.5
myExtent$lat[dex+3]<- Inv$Lat[site] +.5

}

Finally we will intersect the points with the SpatialPolygonsDataframe  and get out the tiles

pts <- SpatialPoints(myExtent)

proj4string(pts)<- proj4string(spdf)

Tiles <- over(pts,spdfc)[1:2]

Tiles <- unique(Tiles)

nrow( Tiles)

print(Tiles)

Depending on the increment I used  (.5 versus .2)  I basically had to collect 10-12 tiles to cover my bases for the next step.

With that list of tiles I downloaded the tiles I needed for  July4th  2006. In the future with the MODIS package this will happen just by writing code, but for now I did it by hand. With those 12 tiles downloaded I then fired up MRT to mosaic , resample, and reproject the tiles into a monolithic GeoTiff  for daytime LST for the whole Conus.

Reading that it we get the following

And then we can zoom in on the midwest and then go down chicago

On the left at  lon -87.9 lat 41.65  is the Paw Paw Woods Nature Preserve.  Pretty Cool

Categories: Uncategorized