I want to combine multiple .gz
files into 1 within Julia. The command to do so in the command line is:
cat file1.gz file2.gz file3.gz > allfiles.gz
But the following does not work:
files = ["file1.gz", "file2.gz", "file3.gz"]
run(pipeline(`cat $files`, stdout="allfiles.gz"))
The function doesn’t throw any error but allfiles.gz
is corrupt. Is this a bug?
I figured it out as I was trying make a MWE. The following does not work:
using CodecZlib
for i in 1:3
file = "file$i.gz"
x = rand(2, 2)
io = GzipCompressorStream(open(file, "w"))
write(io, "hello, testing from i = $i")
end
files = ["file1.gz", "file2.gz", "file3.gz"]
# runs without error but file corrupted
run(pipeline(`cat $files`, stdout="allfiles.gz"))
But this works:
using CodecZlib
for i in 1:3
file = "file$i.gz"
x = rand(2, 2)
io = GzipCompressorStream(open(file, "w"))
write(io, "hello, testing from i = $i")
close(io) #need to close stream
end
files = ["file1.gz", "file2.gz", "file3.gz"]
run(pipeline(`cat $files`, stdout="allfiles.gz")) # works
The difference is that in the first case I forgot the close the stream. Sorry!
2 Likes
Note that your problem is simply concatenating files, and has nothing specific to gz (you are simply concatenating compressed files without compressing/decompressing, which is fine, you can do it with gz).
So you would simply read and write them, eg
open("allfiles.gz", write = true) do io # UNTESTED, just a sketch
for file in ["file1.gz", "file2.gz", "file3.gz"]
write(io, read(file, String))
end
end
or do some variant of block based IO when the files are larger.
3 Likes