How to concatenate multiple `.gz` files within Julia?

I want to combine multiple .gz files into 1 within Julia. The command to do so in the command line is:

cat file1.gz file2.gz file3.gz > allfiles.gz

But the following does not work:

files = ["file1.gz", "file2.gz", "file3.gz"]
run(pipeline(`cat $files`, stdout="allfiles.gz"))

The function doesn’t throw any error but allfiles.gz is corrupt. Is this a bug?

I figured it out as I was trying make a MWE. The following does not work:

using CodecZlib
for i in 1:3
    file = "file$i.gz"
    x = rand(2, 2)
    io = GzipCompressorStream(open(file, "w"))
    write(io, "hello, testing from i = $i")
end
files = ["file1.gz", "file2.gz", "file3.gz"]

# runs without error but file corrupted
run(pipeline(`cat $files`, stdout="allfiles.gz")) 

But this works:

using CodecZlib
for i in 1:3
    file = "file$i.gz"
    x = rand(2, 2)
    io = GzipCompressorStream(open(file, "w"))
    write(io, "hello, testing from i = $i")
    close(io) #need to close stream
end
files = ["file1.gz", "file2.gz", "file3.gz"]
run(pipeline(`cat $files`, stdout="allfiles.gz")) # works

The difference is that in the first case I forgot the close the stream. Sorry!

2 Likes

Note that your problem is simply concatenating files, and has nothing specific to gz (you are simply concatenating compressed files without compressing/decompressing, which is fine, you can do it with gz).

So you would simply read and write them, eg

open("allfiles.gz", write = true) do io # UNTESTED, just a sketch
    for file in ["file1.gz", "file2.gz", "file3.gz"]
        write(io, read(file, String))
    end
end

or do some variant of block based IO when the files are larger.

3 Likes