Home > Uncategorized > Handling .Z files

Handling .Z files

A while back Steve Mcintyre was looking for a way to handle .Z files in R

Ron Broberg over at the whiteboard had an approach that steve adopted both for untar and for uncompressing .Z files.  While the approach is slick, its somewhat of a hack. Nothing wrong with that, but I wanted something a bit more elegant.

Long ago a reader Nicholas created a package on R called “uncompress” to handle the .Z file issue, but steve was not able to get it to work and neither was I. Luckily Nicholas made his contact info available and I was able to get him a bug report with a file (ghcnv2.Z) and the code I used to download the file and unzip it. The error was relatively minor and related to end of file padding. Nicholas fixed the “bug”  and today I had sucess with downloading and unzipping .Z files. So now in Moshtemp when you download the ghcnv2.Z file I will automagically unzip it for you.

next I decided to look at the untar problem. Steve Mc had “untared” files by copying a version of untar down to his system and then fed that exe a command from inside R. That’s un necessary as R has an “untar” command. So, below, we can see how to download “tar”  files  from NOAA, untar them, and then uncompress them.  Any questions on “uncompress” just write. Its on CRAN

ftp   <- “ftp://ftp.ncdc.noaa.gov/pub/data/icoads”

Imma  <- “IMMA”

start <- 1914

end   <- 1917  # test with small subset

years <-  start:end

Tar_Dir <- “IcoadsTar”

Zfile_Dir <- “IcoadsZ”

Icoads_Dir <- “IcoadsData”

dir.create(Tar_Dir)

dir.create(Zfile_Dir)

dir.create(Icoads_Dir)

fnames <- paste(Imma,”.”,years,”.”,”tar”,sep=””)

# fnames is ALSO fetchable with RCurl.. when I learn it

getIcoadsTar <- function(site=ftp,files=fnames,tDir=Tar_Dir,zDir=Zfile_Dir){

for(i in 1:length(files)){

fullname <- file.path(site,files[i],fsep=.Platform$file.sep)

destinationfile=file.path(tDir,files[i],fsep=.Platform$file.sep)

download.file(fullname,destfile=destinationfile)

untar(destinationfile,exdir=file.path(getwd(),zDir,fsep=.Platform$file.sep))}

}

unZipIcoads <- function(zDir=Zfile_Dir,dataDir=Icoads_Dir){

files <- list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=TRUE,pattern=”(.Z)”)

localnames<-list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=FALSE,pattern=”(.Z)”)

destnames <- gsub(“.Z”,”.dat”,localnames)

for(i in 1:length(files)){

handle <- file(files[i], “rb”)

data <- readBin(handle, “raw”, 99999999)

close(handle)

uncomp_data <- uncompress(data)

handle <- file(file.path(dataDir,destnames[i],fsep=.Platform$file.sep), “wb”)

writeBin(uncomp_data, handle)

close(handle)

}

}

The first function will download and untar the files. When that completes, you unzip them all.

Have a nice weekend

UPDATE:  a cleaner version that cleans up .Z files as you progress:

ftp   <- “ftp://ftp.ncdc.noaa.gov/pub/data/icoads”

Imma  <- “IMMA”

start <- 1914

end   <- 1915  # test with small subset

years <-  start:end

Tar_Dir <- “IcoadsTar”

Zfile_Dir <- “IcoadsZ”

Icoads_Dir <- “IcoadsData”

dir.create(Tar_Dir)

dir.create(Zfile_Dir)

dir.create(Icoads_Dir)

fnames <- paste(Imma,”.”,years,”.”,”tar”,sep=””)

download.unTarUnzipIcoads <- function(site=ftp,tars=fnames,tDir=Tar_Dir,zDir=Zfile_Dir,dDir=Icoads_Dir){

for(i in 1:length(tars)){

fullname <- file.path(site,tars[i],fsep=.Platform$file.sep)

destinationfile=file.path(tDir,tars[i],fsep=.Platform$file.sep)

download.file(fullname,destfile=destinationfile)

untar(destinationfile,exdir=file.path(getwd(),zDir,fsep=.Platform$file.sep))

files <- list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=TRUE,pattern=”(.Z)”)

localnames<-list.files(path=file.path(getwd(),zDir,fsep=.Platform$file.sep),full.names=FALSE,pattern=”(.Z)”)

destnames <- gsub(“.Z”,”.dat”,localnames)

for(j in 1:length(destnames)){

handle <- file(files[j], “rb”)

data <- readBin(handle, “raw”, 99999999)

close(handle)

uncomp_data <- uncompress(data)

handle <- file(file.path(dDir,destnames[j],fsep=.Platform$file.sep), “wb”)

writeBin(uncomp_data, handle)

close(handle)

}

unlink(files)

}

}

And if you just want a stand alone version to unzip .Z files

unZipdotZ<-function(Zfile,destfile,remove=TRUE){

# this function is called for the side effect of uncompressing a .Z file

# Zfile is a path to the Zfile

# destfile is the uncompressed file to be written

# no protection against overwriting

# remove the Z file

if(!file.exists(Zfile))stop( cat(Zfile,” does not exist”))

handle <- file(Zfile, “rb”)

data <- readBin(handle, “raw”, 99999999)

close(handle)

uncomp_data <- uncompress(data)

handle <- file(destfile, “wb”)

writeBin(uncomp_data, handle)

close(handle)

if(remove==TRUE)unlink(Zfile)

}

Categories: Uncategorized
  1. September 13, 2010 at 8:49 PM

    It’s hard to say this without sounding trite, but way to go on the getting the gunzip added at the source. That is the open-source-coder way.😀

  2. Steven Mosher
    September 13, 2010 at 9:50 PM

    Ya, while the method of using a system call was very cool, I didnt want to add code to
    check what system somebody had and then download a windows exe to their system for them
    etc etc etc. Much cleaner to keep it all in R. Nicholas was great, busy at work but he stepped up and maintained his his code

  3. Will Petry
    February 23, 2015 at 6:05 AM

    At the risk of stating the obvious (but to clarify for fellow newbies working with .Z files on Mac), this system call will utilize the uncompress utility built into the Mac OS without requiring the now defunct ‘uncompress’ package from CRAN:

    system2(“uncompress”, args=”my_filename.Z”)

    ht: Steven Mosher’s comment and commenters on Steve McIntyre’s original post for pointing me in the right direction after a fruitless battle with some C code found in the ‘uncompress’ CRAN archive.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: