All material here is Licensed under CC by sa

Modis QC Bits

December 5, 2012 39 comments

In the course of working through my MODIS  LST project and reviewing the steps that Imhoff and Zhang took as well has the data preparations other researchers have taken ( Neteler ) the issue of MODIS Quality control bits came up.  Every MODIS  HDF file comes with multiple SDS or multiple layers of data. For MOD11A1 there are layers for daytime LST, Night LST, the time of  collection, the emissivity from ch 31 and 32,  and two layer for Quality Bits.  The various layers of   MODIS11A1  file is available here. Or if you are running the MODIS package, you can issue the getSds()  call and pass it the name of a HDF file.

As Neteler notes the final LST files contain pixels of various quality. Pixels that are obscured by clouds are not produced, of course, but that does not entail that every pixel in the map is of the same quality. The SDS layers for QC  — QC_day and QC_Night– contain the QC bits for every lat/lon  in the LST map. On the surface this seem pretty straighforward, but deciding the bits can be a bit of a challenge. To do that I considered several source.

 1. Neteler’s web post

2. The Modis groups  user guide

3. The  QA Tutorial

4. LP  DAAC  Tools, specifically the LDope tools provided by guys at MODLAN

I will start with 4, the ldope tools, If you are a hard core developer I think these tools would be a great addition to your bag of tricks. I won’t go into a lot of detail, but the tools allow you to do a host of processing on HDF files using a command line type interface. They provide source code in c and at some point in the future I expect that I would want to either write an R interface to these various tools  or take the C code directly and wrap it in R. Since they have the code for reading HDF directly and quickly it might be a source to use to support the direct import of HDF  into raster. However, I found nothing to really help with understanding the issue of decoding the  QC bits. You can manipulate the QC layer and do a  count of the various flags  in the data set. Using the following command on all the tiles for the US I was able to poke around and figure out what flags are typically reported

comp_sds_values -sds=QC_Day C:\\Users\\steve\\Documents\\MODIS_ARC\\MODIS\\MOD11A1.005\\2005.07.01\\MOD11A1.A2005182.h08v05.005.2008040085443.hdf

However,  I could do the same thing simply by loading a version of the whole US into raster and using the freq() command on raster. Figuring about what values are in the QC file is easy, but we want to understand what they mean. Let’s begin with that list of values. By looking at the frequency count of the US raster and by looking at a large collection of HDf files I found the following values for the QC layer

values <-c(0,2,3,5,17,21,65,69,81,85,129,133,145,149,193)

The question is  what do they mean. The easiest one to understand is 0. A zero means that this pixel is highest quality. If we want only the highest quality pixels, then we can use the QC map, turn it into a logical mask and apply it to the  LST map such that we only keep pixels that  have 0 for their quality bits. There, job done!

Not so fast. We throw away many good pixels that way. To understand the QC bits lets start with the table provided

Table

If you are not clear on how  this works  every number, every integer in the QC map is an unsigned 8 bit integer. The range of numbers is 0 to 255. The integers represent bit codes which requires you to do some base 2 math. That is not the tricky part. The number 8 for example would be   “0001”  and 9 would be  1001.  If you are unclear about binary representations of integers I suppose google is your friend.

Given the binary representation of the integer value we are then in a position to understand what the quality bits  represent, but there is a minor complication which I’ll explain using an example.  Let’s take  the integer value of 65.  In R the way we turn this into a binary representation is by using the call intToBits.

intToBits(65)
[1] 01 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[25] 00 00 00 00 00 00 00 00

We only need 8 bits, so we should do the following

> intToBits(65)[1:8]
[1] 01 00 00 00 00 00 01 00

and for more clarity I will turn this into 0 and 1 like so

> as.integer(intToBits(65)[1:8])
[1] 1 0 0 0 0 0 1 0

So the first bit ( or bit 0 ) has a value of 1, and seventh bit  ( bit 6 ) has a value of 1.  We can check our math 2^6 =64  and 2^0 = 1, so it checks. Don’t forget that the 8 bits are numbered 0 to 7 and each represents a power of 2.

If you try to use this bit representation to understand the  QC “state”,  you will go horribly wrong!. It’s wrong because HDF files are written in big endian format. If you are not familiar with that, what you need to understand is that the least significant bit is on the right. In the example above the zero bit is on the left, so we need to flip the bits so that bit zero is on the right . Little Endian goes from right to left, 2^0 to 2^7. Big Endian goes left to right 2^7 to 2^0.

Little Endian:   1 0 0 0 0 0 1 0   = 65       2^0 + 2^6

Big Endian  0 1 0 0 0 0 0 1         = 65        2^6 + 2^0

The difference is this:

If we took  1 0 0 0 0 0 1 0 and ran it through the table, the table would indicate that  the first two bits 10  would mean the pixel was not generated, and  bit 1 in the 6th position would indicate an LST error of  <= 3K.  Now flip those bits

> f<-as.integer(intToBits(65)[1:8])
> f[8:1]
[1] 0 1 0 0 0 0 0 1

This bit order has the first two bits as 01 which indicates that the pixel is produced, but that other quality bits should be checked. Since bit 6 is 1 that indicates  the pixel has an error > 2Kelvin but less than 3K

In short,  to understand the QC bits we first turn the integer into a bit notation and then we flip the order of the bits and then use the table to figure out the QC status.  For MOD11A1 I wrote a quick little program to generate all the possible bits and then  add descriptive fields. I would probably do this for every Modis project I worked on especially since I dont work in binary everyday and I made about 100 mistakes trying to do this in my head.

QC_Data <- data.frame(Integer_Value = 0:255,
Bit7 = NA,
Bit6 = NA,
Bit5 = NA,
Bit4 = NA,
Bit3 = NA,
Bit2 = NA,
Bit1 = NA,
Bit0 = NA,
QA_word1 = NA,
QA_word2 = NA,
QA_word3 = NA,
QA_word4 = NA
)

for(i in QC_Data$Integer_Value){
AsInt <- as.integer(intToBits(i)[1:8])
QC_Data[i+1,2:9]<- AsInt[8:1]
}

QC_Data$QA_word1[QC_Data$Bit1 == 0 & QC_Data$Bit0==0] <- “LST GOOD”
QC_Data$QA_word1[QC_Data$Bit1 == 0 & QC_Data$Bit0==1] <- “LST Produced,Other Quality”
QC_Data$QA_word1[QC_Data$Bit1 == 1 & QC_Data$Bit0==0] <- “No Pixel,clouds”
QC_Data$QA_word1[QC_Data$Bit1 == 1 & QC_Data$Bit0==1] <- “No Pixel, Other QA”

QC_Data$QA_word2[QC_Data$Bit3 == 0 & QC_Data$Bit2==0] <- “Good Data”
QC_Data$QA_word2[QC_Data$Bit3 == 0 & QC_Data$Bit2==1] <- “Other Quality”
QC_Data$QA_word2[QC_Data$Bit3 == 1 & QC_Data$Bit2==0] <- “TBD”
QC_Data$QA_word2[QC_Data$Bit3 == 1 & QC_Data$Bit2==1] <- “TBD”

QC_Data$QA_word3[QC_Data$Bit5 == 0 & QC_Data$Bit4==0] <- “Emiss Error <= .01″
QC_Data$QA_word3[QC_Data$Bit5 == 0 & QC_Data$Bit4==1] <- “Emiss Err >.01 <=.02″
QC_Data$QA_word3[QC_Data$Bit5 == 1 & QC_Data$Bit4==0] <- “Emiss Err >.02 <=.04″
QC_Data$QA_word3[QC_Data$Bit5 == 1 & QC_Data$Bit4==1] <- “Emiss Err > .04″

QC_Data$QA_word4[QC_Data$Bit7 == 0 & QC_Data$Bit6==0] <- “LST Err <= 1″
QC_Data$QA_word4[QC_Data$Bit7 == 0 & QC_Data$Bit6==1] <- “LST Err > 2 LST Err <= 3″
QC_Data$QA_word4[QC_Data$Bit7 == 1 & QC_Data$Bit6==0] <- “LST Err > 1 LST Err <= 2″
QC_Data$QA_word4[QC_Data$Bit7 == 1 & QC_Data$Bit6==1] <- “LST Err > 4″

Which  looks like this

AllBits

Next,  I will remove those flags that don’t matter.  All good pixels, all pixels that don’t get drawn, and  those where the TBD bit is set high. What I want to do is select all those flags where the pixel is produced but the pxel quality is not the highest quality

FINAL <- QC_Data[QC_Data$Bit1 == 0 & QC_Data$Bit0 ==1 & QC_Data$Bit3 !=1,].  Below see a part of this table

Final2

And then I can select those  that occur in my HDF files.

BitsIN HDF

Looking at LST error which matters most to me, the table indicates that I can use pixels that have a  QC value of 0,5,17 and 21. I want to only select  pixels where LSt error is less than 1 K

Very quickly I’ll show how one uses raster functions to apply the QC bits to  LST.   using MODIS R and the function  getHdf() I’ve downloaded all the tiles for US for every day in July 2005. For July 1st, I’ve used  MRT to mosiac and resample  all the SDS. I use Nearest neighbor to insure that I dont average quality bits. The mosaic is the projected from a SIN projection to a Geographic projection ( WGS84) and a geotiff output format. Pixel size is set at   30 arc seconds

The SDS are read in using R’s raster

library(raster)

##  get every layer of data in the July1 directory

SDS <- list.files(choose.dir(),full.names=T)

##  for now I just work with the Day data

DayLst <- raster(SDS[7])
DayQC <- raster(SDS[11])

##  The fill value for LST is Zero. That means Zero represents a No Data pixel

##  So, we force those pixels to NA

DayLst[DayLst==0]<-NA
NightLst[NightLst==0]<-NA

##  Units in LST  need to be scaled to be put into Kelvin. The adjustment value is  .02.  See the docs for these values

##  every SDS has a different fill value and a different scaling value.
DayLst <- DayLst * .02

Now, we can plot the two raster

plot(DayLst, main = “July1 2005 Day”)

plot(DayQC, main=”Day Quality COntrol”)

USday

DauQCUs

And I note  that the area around texas has a nice variety of QC bits, so we can zoom in on that

gulf

qcgulf

Next we will use  the QC  layer and create mask

m <- DayQC
m[m > 21]<-NA
m[ m == 2 | m== 3]<-NA

Good <- mask(DayLst, m)

For this mask I’ve set all those grids with  QC bits  greater than 21 to NA.  I’ve also set  QC pixels with a value of 2 and 3 to NA.  This step isnt really necessary  since  QC  values of 2 and 3 mean that the pixel doesnt get produce, but I want to draw a map of the mask which will show which bits are NA and which bits are not NA.  The mask() function works like so. “m” is laid over DayLst.  bits that are NA in “m” will be NA in the destination (“Good”)  if a pixel is not NA in “m” and has a value like, 0 or 17 or 25, then the value of the source (DayLst) is written into the destination.

Here are the values in my mask

> freq(m)
value    count
[1,] 0        255179
[2,] 5         12
[3,] 17       8539
[4,] 21        3
[5,] NA   189147

So the mask will “block” 189147 pixels in DayLst from being copied from source to destination, while allowing the source pixels that have QC bits of 0,5,17,21 to be written. These QC bits  are those that represent an LST error of less than 1K

mask

When this mask is applied the colored pixels will be replaced by the values in the “source”

Good Lst

And to recall, these are all the pixels PRIOR to the application of the QC bits

gulf

As a side note, one thing I noticed when doing my SUHI analysis was that the LST data had a large variance over small areas. Even when I normalized for elevation and slope and land class there were  pixels that were 10+ Kelvin warmer than the means. Essentially 6 sigma cases. That, of course lead me to have a look at the QC bits.  Hopefully, this will improve some of the R^2 I was getting in my test regressions.

Categories: Uncategorized

Modis R: Package tutorial

November 25, 2012 13 comments

The MODIS package for R

For people who work in GIS in R there has been a bit of challenge in working with data from  Modis. I’ll use an example from my own work on UHI to illustrate. A while back I decided that I wanted to look at albedo data from MODIS.  If you want to get MODIS data there are basically two approaches. You can use  the web services  like REVERB which is basically a web interface that allows you to construct queries for a wide variety of data ( or similar services shown here ) or you can download files directly from the datapool ftp.

For my Albedo project I decided to download directly from ftp because I could write a simple program in R to collect the entries in the ftp directories and then download them over the course of a few hours.  It’s essentially the same process I use in my study of SUHI for small cities. Getting the data locally is not a big problem, except for the storage issue.  You can imagine what it would take to download 90 days of LST data for the US.  with 10-14 tiles of data required that ends up being a nice chunk of data.

The next challenge is getting MODIS data out of its native format into a format that is friend for R GIS packages. In the case of Albedo that data was stored in a format known as “HDF” or HDF-EOS.  I work with  the raster package and raster ( along with rgdal)  have no way of reading HDF directly. This meant I had to find a way to transform HDF into a more common format like “geoTiff”. There was also the issue of tiling multiple hdf files into a single raster. Unlike other datasets that might give you a monolithic file of the whole world, the MODIS collections are distributed as tiles using their tiling system.

To solve this problem there are a couple tools: MRT or the modis reprojection tool. or GDAL, the Geospatial Data abstraction Library. For Albedo I decided to use GDAL and I used the distribution of GDAL provided by FWTools.  With a few hundred albedo files I had to write a script to load every Hdf file and transform it to a ‘geotiff’ file. That worked fine, but now I had a set of R code for doing my statistics and a “bat” file for controlling Gdal.  With my SUHI project I decided to use MRT. In that case I used the GUI they provide to select all the files, mosaic them, crop them, resample and reproject. I quickly learned that duplicating the same steps with a GUI is not always easy. Further, if I am trying to create work that is reproduceable, it makes it hard. Plus I could not see using that tool to create 365 maps for every day in the year.

What’s needed is an R package that allows us to write R code and achieve the same results as we get with GDAL or MRT. We want to write R code that controls these external applications and tools. That’s what  MODIS R does.

The team: The MODIS R team consists of  Matteo Mattiuzzi, Jan Verbesselt, Forrest Stevens, Tomislav Hengl, Anja Klisch, Bradley Evans and Agustin Lobo. Some of the seminal work was done by Tomislav at his wonderful page

My Tutorial.  I had some discussions with Matteo who is the package maintainer and  suggested that I might do a tutorial to help people get started with the package.  At this stage the package is stable, so I decided to start a tutorial. The way I work is pretty simple. As I try  new things I write them up.    For now the tutorial  starts with the basics of R, R studio and preparing your system for MODIS R. That includes downloading a the tools and testing them out and finally downloading and building a WINDOWS version of MODIS R.

Then I will start to go through the API and test and explain all the calls in more depth than the manual and write some sample code.

A note. If you have questions about what I am doing go ahead and leave comments. I can collect up questions and those I cant answer I can pass on to the maintainers.

The tutorial starts here

Categories: Uncategorized

Terrain effects on SUHI estimates

November 10, 2012 2 comments

Introduction

Zhang and Imhoff (2010 pdf here utilized NLCD impervious surface area (ISA), Olson biomes, and MODIS Land Surface temperature (LST) to estimate the magnitude of UHI in large cities across the US.  Peng  employed a   similar approach in studying 419 large cities ( population greater than 1m ) around world. Peng’s work suggests a limit or threshold to the  Surface Urban Heat Island (SUHI) effect in that he found that city size and population was not an effective predictor of SUHI for very large cities. Imhoff’s study on somewhat smaller cities suggest there is a range at which city size is an effective predictor of SUHI

In addition, Zhang and Imhoff found that SUHI is related to the surrounding environment. A city that is developed in a forest biome, for example, will show a greater SHUI than one embedded in a desert.

Zhang and Imhoff’s curve is, as they note, somewhat similar to Oke’s postulated curve for the relationship between population and the Urban Heat Island  ( UHI  as measured by air temperature as opposed to surface temperature ) and it shares a certain weakness with Oke’s analysis with respect to the sample size at very small populations/small size.  My goal in this study is to enlarge the sample of smaller sized cities and towns and refine the estimate in much the same way that Peng focused his study on cities over 1M in population. My working hypothesis is that just as there is a  plateau of SUHI above a certain size that there may be a plataeu at smaller sizes, suggesting that a logistic relationship might be a better representation of the underlying process.

Data

Sites were selected for study by utilizing MODIS ( MOD12Q1 2005) (15 arc second) Landcover data. The 17 class landcover product was reprojected from its native format into a WGS84 GeoTiff using nearest neighbor filtering and with a 15 arc second ( 500 meter) output format. The landcover data was reclassifed into urban and non urban pixels and a cluster analysis was performed to create patches of continguous urban pixels.  Various size patches, from 10 to 100sq km,  of urban areas were then selected for further processing. In addition, Census gazetter information was used to select locations of even smaller size from .5sqkm to 10 sq km.  A 1km Digital Elevation model derived from 90m SRTM data was utilized to compute elevations, slopes and aspects. Finally, 8 day 1km Day LST  and Night LST from MODIS was used. For this pilot study one 8 day period from July 4th 2006 was utilized ( MOD11A2 v 5 ) . The LST product was reprojected from its native format into a WGS 84 Geotiff using nearest neighbor filtering and a 30 arc second (~1km) output format.

Methods

Census gazetter information was used to select a sample of small towns and CDPs that had areas less than 10 sq km and no substantial water area within the areas boundaries. That list was culled further using NLCD 2006 impervious area and Nightlights derived ISA to ensure that the small towns were at least 10km from larger urban areas.  Urban areas larger than 10 sqkm were selected by created an urban/ rural mask from MODIS landcover.  The mask was processed by finding patches of urban areas using a queens case clustering approach. The urban areas were sorting according to size and then classified according to their isolation from other urban areas. The goal of this processing was to identify urban areas that were surrounded predominately by rural pixels. A 20 km boundary zone was established around every urban patch and to qualify for selection 99% of the pixels outside the urban zone had to be classified as rural by MODIS landcover. This selection process resulted in a database of 99 sites. Given what I’ve learned in this process I will probably loosen the restriction regarding “satellite” urban pixels surrounding the urban core and include more sites in the final analysis.

The selected sites where then processed as follows. For every location a distance map was created for a 100km radius around the site center. The distance map provided the distance that every rural pixel was from the closest urban pixel. In the end, analysis showed that the distance from an urban pixel was not a significant driver in the land surface temperature of the rural pixel.  With regards to Surface Air Temperatur (SAT) and  UHI measures, it is likely that distance from the nearest urban pixel would be significant due to horizontal advection, but with Land Surface Temperture and SUHI, the physical ability of heated land surface in the urban area to influence the LST in adjacent rural cells is apparently limited. In the final analysis, nevertheless, I will probably add a screen to remove rural pixels that are within 2km of a urban pixel.

After distance maps are created out to 100km, the elevation data is processed within the same radius. In addition to retrieving the elevation for each cell the slope and aspect of the cell is generated. Zhang and Imhoff (2010) utilized elevation as screen for selecting both urban and rural pixels. Their screen required that a pixel be with a +-50 meter window of the mean urban pixel elevation. As the results below show elevation does have an impact on LST so that before comparing urban to rural pixels some form of normalization must be used. In addition to elevation, however, the slope and aspect of a grid cell can influence the LST.  In examining differences between the ASTER LST product and the MODIS LST product Liu (2009) pointed to terrain effects such as slope as part of the reason for differences between these products. The difference in slopes between urban pixels and rural pixels is evident and generally speaking urban pixels tend to have lower slopes. The difference in slope, of course, will impact the amount of direct solar radiation that a given cell “sees” and the interaction between the slope and the view angle of the sensor apparently impacts the emissions sensed. The slope effect is small but statistically significant. For this analysis no screen was employed, however a screen of less than 1 degree slope will eliminate only 15% of urban pixels and 20 percent of rural pixels.  A screen at .5 degrees eliminates 35% of all urban pixels and 40% of all rural pixels.

After terrain processing the  Day LST and Night LST is retrieved for every pixel within 20km of the site center. As this radius is less than the 100km radius used for distance map and terrain processing we are assured of having good values for both the distance map and the slope and aspect maps at the 20km boundary. The cell centers of the LST product are then used to retrieve the corresponding distance map values, landcover values, and terrain values.

Results

Zhang and Imhoff(2010) approached the problem estimating  SUHI by classifying the urban pixels using ISA and the rural pixels using Olson Biomes. The approach I take is slightly different. The temperature product MODIS LST which we both use is a product that estimates the LST based on other MODIS products. Most importantly the split window algorithm uses estimates of emmisivity that are derived from Modis landcover product. Such that if MODIS Landcover believes the pixel is urban or croplands it does not matter what the ISA product thinks the pixel is and it doesnt matter how Olsen classified the pixel. Modis Landcover is used in the processing chain to to determine the LST and is the more appropriate classification tool. What we also see here is that Urban/Rural differences are to some extent built into the results. If Modis Landcover classifies a pixel as urban that classification will drive the emissivity estimate which in turn drives the output of the split window alogorithm employed to determine LST.

A box plot of Daytime LST by landcover type shows the range of temperature differences across the various land classes. It should be noted that this sample has not been screened for elevation, slope, or latitude and includes all pixels processed for the 99 sites

The x axis represents the IGBP code for land cover class, but its instructive to ask if the urban class is identifiable on inspection. If you think that class 7 is urban you are wrong. class 7 is open shrublands.  Class 10 is grasslands. Urban is class 13.  Below is the nighttime LST

 

What both figures illustrate is that the differential between Land surface temperature in the urban setting and Land surface temperature in the non urban setting is not single number. The implication for UHI studies is clear. The difference in temperature between a urban area and a rural area depends as much upon the rural setting as it does upon the urban setting. It is possible to select areas, such as grasslands and open shrubland that are both rural and warmer than the urban setting. It is also possible to amplify the difference by selecting other classes such as wetlands ( class 11 ) which are at the cooler end of rural landscapes. And its possible to get confusing answers by selecting croplands or mixed croplands ( class 12 and 14 ) which have high variability where some exhibit warmer LST and some exhibit cooler LST than urban areas.

At this stage with only 8 days of temperature there is not enough data to draw any conclusions. So the next step will be to download an entire year of LST daily data to construct seasonal averages for these sites. One thing we can do as a sort of sanity check is some simple regression on the entire dataset.  Below, we can see that  40% of the variation is explained by looking at the urban area ( which lumps all rural pixels together with and area of zero ), elevation, latitude, and slope.

Call:
lm(formula = Sites$Day ~ Sites$UrbanArea + Sites$Elevation +
Sites$y + Sites$Slope)

Residuals:
Min 1Q Median 3Q Max
-34.419 -3.099 -0.543 2.947 20.985

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.196e+02 8.395e-02 3806.62 <2e-16 ***
Sites$UrbanArea 5.640e-02 2.046e-03 27.57 <2e-16 ***
Sites$Elevation 7.864e-03 1.941e-05 405.24 <2e-16 ***
Sites$y -4.809e-01 2.140e-03 -224.73 <2e-16 ***
Sites$Slope -7.718e-01 5.962e-03 -129.44 <2e-16 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.202 on 272367 degrees of freedom
(12251 observations deleted due to missingness)
Multiple R-squared: 0.4448, Adjusted R-squared: 0.4448
F-statistic: 5.456e+04 on 4 and 272367 DF, p-value: < 2.2e-16

> model <- lm(Sites$Night~Sites$UrbanArea+Sites$Elevation+Sites$y+Sites$Slope)
> summary(model)

Call:
lm(formula = Sites$Night ~ Sites$UrbanArea + Sites$Elevation +
Sites$y + Sites$Slope)

Residuals:
Min 1Q Median 3Q Max
-23.2246 -1.3937 0.0242 1.5774 7.7279

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.080e+02 3.670e-02 8391.708 < 2e-16 ***
Sites$UrbanArea 2.634e-02 8.862e-04 29.724 < 2e-16 ***
Sites$Elevation -1.542e-03 8.662e-06 -178.006 < 2e-16 ***
Sites$y -4.316e-01 9.388e-04 -459.731 < 2e-16 ***
Sites$Slope -1.695e-02 2.667e-03 -6.355 2.08e-10 ***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.324 on 281416 degrees of freedom
(3202 observations deleted due to missingness)
Multiple R-squared: 0.483, Adjusted R-squared: 0.483
F-statistic: 6.573e+04 on 4 and 281416 DF, p-value: < 2.2e-16

 

What we see here is that Urban Area has a positive coefficient of roughly .056 for daytime LST and a nighttime coefficient of roughly .026. This approach is different than the approach used by Zhang and Imhoff in that rather than screening for the same elevation, I’ve accounted for the elevation and slope via regression. Of course, I’ll check it both ways.  Finally,  The estimate for small town UHI provided by Imhoff was on the order of 1.75C for a town of size 1 sq km. In their study they found that UHI was fit by  3.48*log(area) + 1.75. That particular study had only two datapoints for towns at 1km. Looking at the data for the twons with urban areas less than 2km, the figure of 1.75C for Day SUHI can’t be supported. The difference in Day LST  at that size is effectively zero.

 

Conclusion

Now that I have the processing chains down to the last few bits I need to push forward on several bits.

1. More small area sites.  The restriction I imposed, that 99% of all surounding pixels in a 20 km radius limits the number of small sites that can be found. This restriction was motivated by a concern that distance from an urban pixel might influence the LST.  With a suitable buffer around each site I should be able to process more than the 30 or so small sites I have. Also, elevation screening and slope screening also reduces the number of sites that are usable in the end, so I’ll need to start with more sites.

2. Increase the sampling out to 50 sq km. Preliminary analysis indicated that there may be a break in SHUI around the 10 sq km mark. Although I had limited data ( 8 days in july when the signal should be high ) The UHI signal appeared to show up more clearly at around 10 sq km, So I should probably focus not only on the sub 10 sq km region, but out to 50.  Beyond 50 sq km, my results were pretty much in line with Imhoff, despite the differences in our methods.

3. Eliminate water pixels and wetlands. I believe Imhoff eliminates water pixels. There are a bunch of ways that the data can be cut as one can see from the land class figures.

4. Regress on SUHI directly. This entails processing each site separately. Constructing an average for the small town ( which can be 1-10 pixels) and then construct a rural average across all land classes or some reduced set of land classes. In this case one would want to screen for altitude and slope and perhaps regress with respect to latitude to see of the there is a latitude component. One difficult of course here is that you are averging over a small number of urban pixels and then comapring that to an average over a large number of rural pixels. Kinda messy.

Categories: Uncategorized

Pilot Study: Small Town Land Surface Temperature

October 11, 2012 3 comments

Introduction

Zhang and Imhoff (2010)  pdf here utilized NLCD impervious surface area (ISA), Olson biomes, and MODIS Land Surface temperature (LST) to estimate the magnitude of UHI in large cities across the US.  Peng  employed a   similar approach in studying 419 large cities ( population greater than 1m ) around world. Peng’s work suggests a limit or threshold to the  Surface Urban Heat Island (SUHI) effect in that he found that city size and population was not an effective predictor of SUHI for very large cities. Imhoff’s study on somewhat smaller cities suggest there is a range at which city size is an effective predictor of SUHI

In addition, Zhang and Imhoff found that SUHI is related to the surrounding environment. A city that is developed in a forest biome, for example, will show a greater SHUI than one embedded in a desert.

Zhang and Imhoff’s curve is, as they note, somewhat similar to Oke’s postulated curve for the relationship between population and the Urban Heat Island  ( UHI  as measured by air temperature as opposed to surface temperature ) and it shares a certain weakness with Oke’s analysis with respect to the sample size at very small populations/small size.  My goal in this study is to enlarge the sample of smaller sized cities and towns and refine the estimate in mch the same way that Peng focused his study on cities over 1M in population. My working hypothesis is that just as there is a  plateau of SUHI above a certain size that there may be a plataeu at smaller sizes, suggesting that a logistic relationship might be a better representation of the underlying process.

 

 

Data

Data from a wide variety of sources was used to identify a set of candidate small cities and towns. The 2010 US census gazetter of places in the US was used to identify town and census designated places (CDP )  with land areas less than 10 sq km. NLCD 2006 ( ISA) 30 meter data and Nightlights derived ISA  (1km data ) was used to narrow the list to small towns that are relatively isolated from larger urban areas. For the final processing Landcover provided by MODIS ( MOD12Q1 2005) (500 meter data ) was used to classify urban versus non urban pixels. As noted by Imhoff a cut off at 25% ISA by NLCD standard  is consistent with Modis classification of urban pixels. A 1km Digital Elevation model derived from 90m SRTM data was utilized to compute elevations, slopes and aspects. Finally, 8 day 1km Day LST  and Night LST from MODIS was used. For this pilot study one 8 day period from July 4th 2006 was utilized ( MOD11A2 v 5 )

Methods

Census gazetter information was used to select a sample of small towns and CDPs that had areas less than 10 sq km and no substantial water area within the areas boundaries. That list was culled further using NLCD 2006 impervious area and Nightlights derived ISA to ensure that the small towns were at least 10km from larger urban areas. As  Zhang/Imhoff Olson biomes were examined within .5 degrees of the site to ensure that the area existed in a singular biome and not an area that contained more than one biome. The final sample contained 169 sites spread out over the US as depicted below

Individual datasets were then built for every site. MOD12Q1, MODIS landcover was collected for a ~60 km radius around each site. The landcover data was used to create several masks. An urban pixel mask, a rural pixel mask, and a water pixel mask. The urban pixel mask and the water pixel mask were used to create distance maps: Distance from urban pixel maps and distance from water maps. The elevation data was processed to create a slope map and aspect map for each 60km grid. Finally MODIS11A2 LST data was downloaded and reprojected to a geographic projection ( WGS84) at 1km resolution in GeoTiff format. Data “bricks” in Rs package “raster” were created for each site. Each brick consisted of a urban mask, rural mask, water mask, distance maps, terrain maps and temperature maps.

Some fun charts

The purpose of the pilot study was to assemble the data and start to build the analysis code base for a final production run. During that process some of my ideas and approaches to the problem have defininately changed. The first change was the decision to use MOD12Q1. The reason is clear. Looking at the processing chain for MOD11A2 ( LST ) it is clear that this data product relies on MOD12Q1 for its estimate of emmissivity. To put it simply if NLCD thinks the pixel is Urban or impervious and MOD12Q1 thinks the pixel is trees, then MOD11A2 will assume an emissivity of trees to calculate the final LST. The other thing that changed was my decision to include more terrain data than Zhang Imhoff. In their approach they masked off pixels that were 50meter higher or lower than the average in the urban area. In my approach I will look at using that terrain information to explain the variance we see. Finally, I will probably change my method of looking at the 50km radius around each site. In part this was done to make the calculations of distance maps less intense. I Will probably just go ahead and calculate distance maps for the whole US, a process that might take a couple days at 500meter resolution. That will allow me to sample a wider geographic variety of sites than I currently have which tends to be concentrated in the middle of the country and tends to over sample certain areas with many small isolated towns.

Now for some charts.  First we see a sample chart from a given “brick” of data:

 

And next we see the Day LST and Night LST

 

 

 

And for grins we look at the following:  The ralationship between Temperature and Elevation for day and night,  and below that we have box plots of the temperature versus the land cover types in the RURAL pixels. That is one other difference between Zhang/Imhoff and my work. Where they looked at the “biome” which is really a historical context I will look at the current land cover. The reason is simple: MODIS LST is dependendent opon MODIS landcover in its determination of emissivity and finally temperature.

 

Conclusion

You should note that I have not looked at the Urban rural difference yet. At this stage in the pilot study I don’t want to be looking at any of those results. I just want to get the processing chain bug free and the main routines ironed out. There will be a little more code to pull out the “area” of the small towns but that is pretty trivial. Next step now I think will be finding ways to speed up the calculation of the distance maps.

 

 

 

Categories: Uncategorized

Sample Input Data.

October 8, 2012 1 comment

Just a couple quick examples.

Starting with   30 meter impervious surface

Followed by MODIS Land cover  ( “red” is urban )

And finally Day  LST

 

Google earth

Categories: Uncategorized

More Fun With Modis

October 8, 2012 Leave a comment

I’ve started a tutorial on using the MODIS package in R the first few steps are here.  While I wait on a release of the package I thought I would play around a bit with the MRT tool and see how it worked.  Let’s recall where we are going.  I have an inventory of around 150 small cities and we are going to take a look at SUHI in those locations.  I have the DEM data, The Modis Land Class data, Olson Biomes, NLCD 30 meter land class and impervious surface. The last few bits I will need are MODIS LST and perhaps MODIS albedo ( If I get motivated) and EVI. We also have population data from a couple sources and housing data.

The first step to getting MODIS LST data is deciding what Tiles I need and then selecting dates. I had hoped that I might get away with using just a few tiles for the US, but that doesnt look like its going to happen. To figure out which Tiles I need there is a function in the MODIS package called getTile. But before I figured that out I tried to do it on my own. With some help from the geo sig list  I have the following to figure out the tiles.

First I get the  bounding coordinates for every tile

library(geosphere)

library(sp)

gring <-“http://landweb.nascom.nasa.gov/developers/sn_tiles/sn_gring_10deg.txt&#8221;

 
GRING <- read.table(gring, skip = 7, stringsAsFactors =FALSE, nrows=648,
na.strings=c(“-999.0000″,”-99.0000″))

colnames(GRING)<-c(“v”,”h”,”ll_lon”,”ll_lat”,
“ul_lon”,”ul_lat”,”ur_lon”,”ur_lat”,
“lr_lon”,”lr_lat”)

GRING <- GRING[!is.na(GRING[,3]),]

DF <- GRING[GRING$h < 15 & GRING$h > 6 & GRING$v < 7 & GRING$v > 3,]

The  table is downloaded and then I remove those rows that are NA. Next, I select those parts of the table that relate to CONUS  between  h(horizontal) 15 and 6 and vertical 7 and 3.

Next, I use Robert Hijmans code for generating a spatialpolygondataframe.

m <- as.matrix(DF)
z <- apply(m, 1, function(x) Polygon(matrix(x[c(3:10, 3:4)], ncol=2, byrow=T)))
n <- names(z)
y <- sapply(1:length(n), function(i) Polygons(z[i], n[i]))
sp <- SpatialPolygons(y, proj4string=CRS(‘+proj=longlat’))
spdf <- SpatialPolygonsDataFrame(sp, DF)

then we make some corrections for being on a sphere and we read in the lat/lon of all 150 sites
spdfc <- makePoly(spdf)

Inv <- read.table(“SiteInventory.inv”,stringsAsFactors = FALSE)

The next bit of clumsy code creates a spatial points data frame  that has 4 points for every site. I’m going to look +- .5degrees from the site center.

myExtent <- data.frame(lon=rep(NA,4*nrow(Inv)),lat=rep(NA,4*nrow(Inv)))

for( site in 1:nrow(Inv)){
dex <- ((site-1)*4)+1
myExtent$lon[dex]<- Inv$Lon[site] +.5
myExtent$lat[dex]<- Inv$Lat[site] -.5
myExtent$lon[dex+1]<- Inv$Lon[site] +.5
myExtent$lat[dex+1]<- Inv$Lat[site] +.5
myExtent$lon[dex+2]<- Inv$Lon[site] -.5
myExtent$lat[dex+2]<- Inv$Lat[site] -.5
myExtent$lon[dex+3]<- Inv$Lon[site] -.5
myExtent$lat[dex+3]<- Inv$Lat[site] +.5

}

Finally we will intersect the points with the SpatialPolygonsDataframe  and get out the tiles

pts <- SpatialPoints(myExtent)

proj4string(pts)<- proj4string(spdf)

Tiles <- over(pts,spdfc)[1:2]

Tiles <- unique(Tiles)

nrow( Tiles)

print(Tiles)

Depending on the increment I used  (.5 versus .2)  I basically had to collect 10-12 tiles to cover my bases for the next step.

With that list of tiles I downloaded the tiles I needed for  July4th  2006. In the future with the MODIS package this will happen just by writing code, but for now I did it by hand. With those 12 tiles downloaded I then fired up MRT to mosaic , resample, and reproject the tiles into a monolithic GeoTiff  for daytime LST for the whole Conus.

Reading that it we get the following

And then we can zoom in on the midwest and then go down chicago

On the left at  lon -87.9 lat 41.65  is the Paw Paw Woods Nature Preserve.  Pretty Cool

Categories: Uncategorized

City Size and SUHI

October 2, 2012 15 comments

In the course of putting together data for my kriging project with the CRN stations, I got another idea related to a small but potentially important corner of the concerns over UHI in the global temperature index. For clarity I suppose I should make it clear that my position is that the UHI bias is probably low, less than .1C decade. That’s basically what Zeke and I found in our AGU poster work ( with help from Nick Stokes, Matt Menne and Claude Williams ). Nevertheless some people persist in claiming that the bias is bigger and further than the bias can even be found in small cities and towns. There isn’t much peer reviewed science to back up this claim, but there are a few “studies”  here and there looking at the problem. Here is short incomplete bibliography.  Torok 2001  http://www.warwickhughes.com/climate/seozuhi.htm; Spenser  http://www.drroyspencer.com/2012/04/regional-u-s-population-adjustments-to-surface-temperatures-since-1973-still-little-warming/;  Warwick Hughes .http://www.warwickhughes.com/blog/?p=575http://solberg.snr.missouri.edu/gcc/heatisland09.pdfhttps://ams.confex.com/ams/Annual2006/techprogram/paper_100595.htm; Not exactly what I would call compelling arguments, but rather some interesting observations. One reason this matters relates to the definitions of “urban” and “rural” that we use for studying the possibility of bias in the global temperature record. The argument, I have found, usually proceeds like this.  The records of UHI for a large city such as Tokyo, Phoenix, or New York are shown and one might see figures for UHI that are 2,3 or even 9C. The reasonable inference from this is that these types of bias may be in the global temperature record.  For a bit of background have a look at this   http://www.nasa.gov/topics/earth/features/heat-island-sprawl.html Dr. Imhoff and his team do some great work and much of it has inspired me to dig a little deeper into the issue. Results like Imhoff’s on large cities create some concern that this UHI bias may be in the record. There are a few things we need to note. The UHI measured in these studies is typically SUHI, that is, the surface urban heat island, while thermometers measure the air temperature 2 meters above this surface. If  SUHI were simply related to the air temperature above the surface, then this would not be a issue, however, air temperature is not simply related to the surface temperature although it can explain a good portion of the variance. The second issue with satellite measures of SUHI is that they are biased to capture the worst case. SUHI and UHI are modulated by the synoptic weather conditions. So, for example, on cloudy days or rainy days the heat islands that develop tend to be smaller than those on sunny days. Sensing the Land surface temperature requires a cloud free day. Hence the satellite measures will be biased high, or rather they will tend to capture the worst SUHI days. This approach is fine for understanding the human health concerns related to UHI and heat waves–another project I am working on,  but one should not be surprised to find that even large cities don’t see  9C heat islands every day of the year. SUHI and UHI vary by weather and by season and by location. The other thing to note about Imhoff’s study is that it tended to focus on large cities.  A comprehensive look at large cities was conducted by Peng who looked at 419 cities around the world http://cybele.bu.edu/download/manuscripts/peng-uhi-est-2012.pdf.  Some interesting findings that tend to confirm some of the things I’ve thought. First, if you look at annual figures for SUHI ( or UHI ) you will come up with numbers typically less than 3C and second there appears to be a threshold to the bias at the high end. Peng argues, for example, that for these large cities that SUHI is insensitive to population density. Since these are all cities over 1 million it might be more precise to say that there is a threshold effect above certain densities.  That’s consistent with what Oke already understood in 1973 when he fit maximum UHI with a log curve. Imhoff, appears to have found something similar in his studies focusing on impervious area as opposed to population density. In his study of 38 large US cities he found a similar log type relationship with the impervious area of a city. That single variable explains 71% of the variance in SUHI. and There are some interesting constrasts between Peng and Imhoff’s work that will take this post in a different direction that center around finding different driver for daytime SUHI and nighttime SUHI, but I’m going to focus on the rather small number of data points for cities less than 10 sq km in size.  The goal for this project is to   focus  on small city SUHI using Imhoff’s approach and then perhaps Peng’s approach. My goal at the outset was to get something on the order of 100 to 500 small cities less than 10 sqkm in size. Along the way we will also see the limitations of some datasets for these types of studies. In due course we will see the limitations of using Grump population data ( Like Spenser uses ) for small  areas and low population areas and we will see some difficulties in using Modis Urban. The Modis limitation is something I’ve struggled with before specifically in our AGU poster and then later in the Berkeley earth UHI paper. There were two basic issues. First, the Modis calibration was done on cities of 100K people or more. Given the size of the Modis pixel it is likely ( as one can see plowing through thousands of small cities) that Modis will miss small urban/town areas and identify areas as rural when in fact they are “small towns” or villages. The other issue with Modis that no one has pointed out is that I think some northern latitudes have snow cover where there are cities, so that Alaska, for example, is mis represented as more rural than it actually is. Checking the dates for when the Modis images were taken provides some confirmation of this. It appears this wasn’t caught in the calibration testing since all the cities used for calibration are at latitudes below 50 ( As I recall, I need to recheck all of this!) There are also some software goals in this project. There is a new Modis package in R that I am eager to use and some landscape statistic packages that I’ve always wanted to play with, so it’s a small bit of new science  to learn and a lot of software fun. Let’s get started! Here is a brief synopsis of Imhoff’s approach ( Peng is a bit different but let’s save that till later ). A)  identify the urban areas you want to study. B) identify the biome C) identify the rural area for comparison D) eliminate confounding variables E) process the Modis LST data for various time periods and seasons. F) regressions Along the way, we will learn something about the limitations of various datasets. A) Finding small urban areas. To find small urban/ small town areas  I employed US census data. After banging my head against the wall with census block shapefiles, the collections by state are huge, I took a different path. I downloaded the gazetter files: http://www.census.gov/geo/www/gazetteer/gazetteer2010.html.  Here I took the “Places” file  which is basically a flat file of  places in the US: cities, towns and CDPs or census designated Places. For now I’m working with 2010 data which is read in nicely by R. Later I’ll update with 2000 data which is not so easily read in. placesFile <- “G:\\USCensus\\Gaz_places_national.txt”  Places <- read.delim(placesFile, stringsAsFactors = FALSE) The file contains all the information I need to conduct a quick search for areas that are small and have people. I also know that when I return to the census shapefiles for these areas I’ll have an easier job of finding what I want. Next I impose some limits on the areas I will be looking at. colnames(Places)[13]<-“Lat” colnames(Places)[14]<-“Lon” Places <- Places[Places$Lat < 50 & Places$Lat > 22 & Places$Lon > -130,] Places <- Places[Places$ALAND <= 10000000,] Places <- Places[Places$POP10 < 2500,] Places <- Places[Places$AWATER < 25000,] After changing the names of Lat and lon columns I limit my area of interest to CONUS. I do that for several reasons. I plan on working with NLCD landcover 2006 and Alaska and Hawaii are missing from that. Second, I have some concerns about Modis Landcover (urban) in Alaska. I may return to this selection criteria. Next I use the ALAND metric provided by the US census to select only those areas with less than 10Million sq meters of area or 10sq km. This area is the area of the polygon that defines that area for the census bureau. Again, I’m only using this to narrow a search I will be doing in the satellite data. Next, I limit the population size I am looking at. The Census data will contain many small physical areas that are located within huge urban sprawls and what we are looking for is small cities that are surrounded by rural or un populated areas. Lastly, I limit the amount of water than can be found within the bounding polygon. With this criteria  we get a huge number of sites to look at on the order of  12,000. Plotting these and looking at them on google earth is my typical approach just to get an idea of potential problems any automated selection process will face. The site statistics are as follows: The average population is 600, average number of houses is 275 amount of land in the average area is 2.5m sq meters and the 75% of the sites have less than 100 sq meters of water. After reviewing plots of some of these areas it was clear that I was still not finding what I was looking for so to winnow this collection more I turned to another dataset  of Impervious surface, the global 1km dataset. Moving from a 30 meter dataset to a kilometer dataset makes the search faster.For the final look at things we will revert back to the 30 meter data, but first we need to find a good selection of small cities that are surrounded by rural environment. I used two versions of the data the standard dataset which contains Isa values from 0 to 100 ( with -1 for water)  and a reformated version of that file where 0 represents no Isa and 1-100 is recoded as 1. This allows me so simply sum to find the number of 1km cells that have any Isa. Places <- cbind(Places, Isa= extract(Isa,cbind(Places$Lon,Places$Lat))) Places <- Places[Places$Isa >= 0,] Places <- cbind(Places, Isa10sqkm = extract(IsaB, cbind(Places$Lon,Places$Lat), buffer = 3000, fun = sum, na.rm= TRUE)) Places <- Places[Places$Isa10sqkm > 0,] Places <- Places[Places$Isa10sqkm <= 36,] The processing above removes and location where water is at the center of the site location and removes those sites that are embedded in areas of other cities. At this stage we are left with 7445 sites which I then review by looking at plots  while counting the total number of 1km Isa cells in a 1/2 degree zone about the site location. for( i in 1:nrow(Places)){ e <- getExtent(IsaB,.5,c(Places$Lon[i],Places$Lat[i])) patch<- crop(IsaB,e) icount <- count(patch,1) Places$IsaCount[i]<-icount fname <- paste(i,”Patch.png”,sep=”_”) fname <- file.path(“Patches”,fname,fsep=.Platform$file.sep) png(fname, 785,785) plot(patch, main= icount ) dev.off() print(i/nrow(Places)) } That code creates 7445 plots of all the potential locations and counts the number of cells in a 1/2 degree zone. I then review those plots. Here is small sample The review of the 7000 plots suggested that if I wanted to find examples where there was a small amount of Isa  in the center of the plot and “non isa” surrounding it that a limit of 1500 total Isa cells in the  1/2 degree zone would go a considerable way toward that goal.  In addition, at this stage I winnowed the sites by Imhoff’s biome criteria. Since the biome that the urban site is embedded in drives the differential in temperature we want to make sure that a city is not in an area where the biome changes. The biome within  1/2 a degree of the site is check to ensure that a city is not in an area where biomes change.  These two cuts through the data reduce the number of sites to examine to  roughly 2400.  At this stage the underlying statistics of the area have not changed with the average population still be around 600 and the average sq km being 2.5. In addition, we now have tagged every site with its biome. The next step was plot all 2400 sites using 30 meter NLCD impervious surface data. The following gives you an idea of the detail available. With  these 2400 sites I figure I have a good chance of finding  a fairly large sample of small cities that I can perform the “Imhoff”  style analysis on. I will say that looking through thousands of maps for small urban areas gives me a good idea of some of the difficulties ahead. It always helps to get really down and dirty with the data. Reviewing the Imhoff approach one should note that he defines Urban areas by NLCD Isa with various zones such as an urban core at 75-100% Isa and then surounding zones at various levels. His cut off is 25% Isa.  One should also note that he checks this against Modis Urban.  That check is the check I turn to next. Why Modis Urban?  The answer lies in the bowels of MODIS LST.  Land Surface temperature data is estimated using a variety of MODIS  products. On critical unknown is emmissivity. That is estimated from Modis Land cover. Simply put, if Modis Land cover doesnt think a pixel is urban it will not apply the correct emissivity and the temperature will not be estimated correctly.  My next step then was to use Modis Land cover to winnow the 2400 sites down to those where Modis thought the small urban area was  “Modis Urban”.  In this next step I add a few more constraints, basically,  steps to ensure that the small urban center is surrounded only by non urban modis pixels. That step reduces the dataset down to 149 stations, mapped below A few things to note.  There are some sites that are located near water which should make an interesting bit of analysis. As I recall recall Imhoff has special handling for water pixels.  There  are  a few more restrictions that Imhoff employs for “rural” pixels  that will probably reduce this set even more and then there are pixels that need to be removed because of altitude differences, but with 150 sites to work with I think I have a manageable number. At this stage I wanted to also check out the  land cover surrounding a station. Imhoff used Biomes  which really dont give a current day view of the land cover rather they give a historical view of things. I may tweak that part of the analysis.  Here are some sample of land cover around the sites. Sites are coded in red..I’m still playing around with ‘rasterVis’  to get legends about land cover to work out

The site stats  now look like this:  Population ranges from  141 to 2417  with a mean of 999 and a median of 879.   The size of the community according to cadastral boundaries is from rounghly .5 sqkm to 10 sq km  with a mean of around 3km.  and the population density according to census figures runs from 36 people per sq km to 1200.

Categories: Uncategorized
Follow

Get every new post delivered to your Inbox.

Join 38 other followers