Multiple files in a gzip archive


#1

Is it possible to create a single compressed archive that contains a number of text or csv files? It looks like gzip.jl only zips a single file at a time.

Thanks!


#2

The gzip format has no support for multiple files (like .zip), so the answer is no.

What are you trying to achieve? If you can save your data in HDF5/JLD, you can create complex hierarchies (resembling a file system).


#3

HDF5/JLD is a probably a good suggestion, although that might depend on your use case.

If you really want to work with a archive with multiple files, you can try ZipFile.jl

Cheers,
Kevin


#4

I have a number of log files in csv format that I would like to compress into a single archive. I’d like to pick a compressed archive format where the user can easily unzip it and inspect the files. Ideally, the format should be something standard and not depend on Julia.

Looks like ZipFile might be a good way to go?
https://zipfilejl.readthedocs.io/en/latest/

Ritchie


#5

Ah, thanks Kevin!


#6

Typically, multiple files are stored together using the tar format, and then gzipped.
The advice I found (although it was from 2015) was to use the tar command by using run(...) to untar all the files into a directory (after uncompressing)
https://github.com/fhs/ZipFile.jl does look good, but I don’t know how maintained it is.


#7

Functionally ZipFile.jl works quite nicely. However it seems quite slow - about 20x slower than reading uncompressed files directly. I’m looking to build something similar but using Libz.jl as its base (similar to BGZFStreams.jl). In tests I ran using Libz.jl to read gzip files, the slowdown compared to reading uncompressed files was only about 10% or so - excellent! Not sure how quickly I can get to writing the ZipFile.jl replacement but I really need it in my work soon.