Handling .Z files
A while back Steve Mcintyre was looking for a way to handle .Z files in R
Ron Broberg over at the whiteboard had an approach that steve adopted both for untar and for uncompressing .Z files. While the approach is slick, its somewhat of a hack. Nothing wrong with that, but I wanted something a bit more elegant.
Long ago a reader Nicholas created a package on R called “uncompress” to handle the .Z file issue, but steve was not able to get it to work and neither was I. Luckily Nicholas made his contact info available and I was able to get him a bug report with a file (ghcnv2.Z) and the code I used to download the file and unzip it. The error was relatively minor and related to end of file padding. Nicholas fixed the “bug” and today I had sucess with downloading and unzipping .Z files. So now in Moshtemp when you download the ghcnv2.Z file I will automagically unzip it for you.
next I decided to look at the untar problem. Steve Mc had “untared” files by copying a version of untar down to his system and then fed that exe a command from inside R. That’s un necessary as R has an “untar” command. So, below, we can see how to download “tar” files from NOAA, untar them, and then uncompress them. Any questions on “uncompress” just write. Its on CRAN
ftp <- “ftp://ftp.ncdc.noaa.gov/pub/data/icoads”
Imma <- “IMMA”
start <- 1914
end <- 1917 # test with small subset
years <- start:end
Tar_Dir <- “IcoadsTar”
Zfile_Dir <- “IcoadsZ”
Icoads_Dir <- “IcoadsData”
dir.create(Tar_Dir)
dir.create(Zfile_Dir)
dir.create(Icoads_Dir)
fnames <- paste(Imma,”.”,years,”.”,”tar”,sep=””)
# fnames is ALSO fetchable with RCurl.. when I learn it
getIcoadsTar <- function(site=ftp,files=fnames,tDir=Tar_Dir,zDir=Zfile_Dir){
for(i in 1:length(files)){
fullname <- file.path(site,files[i],fsep=.Platform$file.sep)
destinationfile=file.path(tDir,files[i],fsep=.Platform$file.sep)
download.file(fullname,destfile=destinationfile)
untar(destinationfile,exdir=file.path(getwd(),zDir,fsep=.Platform$file.sep))}
}
unZipIcoads <- function(zDir=Zfile_Dir,dataDir=Icoads_Dir){
files <- list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=TRUE,pattern=”(.Z)”)
localnames<-list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=FALSE,pattern=”(.Z)”)
destnames <- gsub(“.Z”,”.dat”,localnames)
for(i in 1:length(files)){
handle <- file(files[i], “rb”)
data <- readBin(handle, “raw”, 99999999)
close(handle)
uncomp_data <- uncompress(data)
handle <- file(file.path(dataDir,destnames[i],fsep=.Platform$file.sep), “wb”)
writeBin(uncomp_data, handle)
close(handle)
}
}
The first function will download and untar the files. When that completes, you unzip them all.
Have a nice weekend
UPDATE: a cleaner version that cleans up .Z files as you progress:
ftp <- “ftp://ftp.ncdc.noaa.gov/pub/data/icoads”
Imma <- “IMMA”
start <- 1914
end <- 1915 # test with small subset
years <- start:end
Tar_Dir <- “IcoadsTar”
Zfile_Dir <- “IcoadsZ”
Icoads_Dir <- “IcoadsData”
dir.create(Tar_Dir)
dir.create(Zfile_Dir)
dir.create(Icoads_Dir)
fnames <- paste(Imma,”.”,years,”.”,”tar”,sep=””)
download.unTarUnzipIcoads <- function(site=ftp,tars=fnames,tDir=Tar_Dir,zDir=Zfile_Dir,dDir=Icoads_Dir){
for(i in 1:length(tars)){
fullname <- file.path(site,tars[i],fsep=.Platform$file.sep)
destinationfile=file.path(tDir,tars[i],fsep=.Platform$file.sep)
download.file(fullname,destfile=destinationfile)
untar(destinationfile,exdir=file.path(getwd(),zDir,fsep=.Platform$file.sep))
files <- list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=TRUE,pattern=”(.Z)”)
localnames<-list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=FALSE,pattern=”(.Z)”)
destnames <- gsub(“.Z”,”.dat”,localnames)
for(j in 1:length(destnames)){
handle <- file(files[j], “rb”)
data <- readBin(handle, “raw”, 99999999)
close(handle)
uncomp_data <- uncompress(data)
handle <- file(file.path(dDir,destnames[j],fsep=.Platform$file.sep), “wb”)
writeBin(uncomp_data, handle)
close(handle)
}
unlink(files)
}
}
And if you just want a stand alone version to unzip .Z files
unZipdotZ<-function(Zfile,destfile,remove=TRUE){
# this function is called for the side effect of uncompressing a .Z file
# Zfile is a path to the Zfile
# destfile is the uncompressed file to be written
# no protection against overwriting
# remove the Z file
if(!file.exists(Zfile))stop( cat(Zfile,” does not exist”))
handle <- file(Zfile, “rb”)
data <- readBin(handle, “raw”, 99999999)
close(handle)
uncomp_data <- uncompress(data)
handle <- file(destfile, “wb”)
writeBin(uncomp_data, handle)
close(handle)
if(remove==TRUE)unlink(Zfile)
}
It’s hard to say this without sounding trite, but way to go on the getting the gunzip added at the source. That is the open-source-coder way. 😀
Ya, while the method of using a system call was very cool, I didnt want to add code to
check what system somebody had and then download a windows exe to their system for them
etc etc etc. Much cleaner to keep it all in R. Nicholas was great, busy at work but he stepped up and maintained his his code
At the risk of stating the obvious (but to clarify for fellow newbies working with .Z files on Mac), this system call will utilize the uncompress utility built into the Mac OS without requiring the now defunct ‘uncompress’ package from CRAN:
system2(“uncompress”, args=”my_filename.Z”)
ht: Steven Mosher’s comment and commenters on Steve McIntyre’s original post for pointing me in the right direction after a fruitless battle with some C code found in the ‘uncompress’ CRAN archive.