Gzipped (.csv.gz) writing?

What is or will be the recommended way to write a data frame to a .csv.gz file?

(I tried GZip.open with writetable but this failed.)

regards,

/iaw

Have you tried Libz or CodecZlib? Both are very much more efficient than GZip. They also use FileIO and DataStreams interfaces IIRC.

2 Likes

No, I have not, and I would be happy to try it. Alas, it is not clear how.

Libz tells me that it is deprecated in favor of CodecZlib. CodeZlib has practically no documentation, examples, etc. I can’t even figure out if codeczlib has its own compression format (and/or whether there are unix utility for it, too; gzip has gzcat, gzgrep, and more.)

1 Like

I’ll try to post examples. I’m using both quite successfully (even though Libz says it’s deprecated). Not near my laptop right now.

1 Like

Since @js135005 did not post any example, I would like to add some for future visitors of this thread. The issue with CodecZlib is that it is really interconnected with TranscodingStreams, where most of the examples are found, see https://bicycle1885.github.io/TranscodingStreams.jl/stable/examples.html.
In order to write DataFrame into a gzipped csv, I use the following lines.

df = DataFrame(...)
open(GzipCompressorStream, "table.csv.gz", "w") do stream
    CSV.write(stream, df)
end

Reading is very similar.

df = open(GzipDecompressorStream, "table.csv.gz", "r") do stream
    CSV.read(stream)
end
6 Likes

@janfrancu: Is it possible to define compression level and if so, how. I do see some mention of level in codeczlib(compression.jl) but not able to apply.

It would be nice to be able to pass the keyword argument to GzipDecompressorStream or GzipCompressorStream like this

open(GzipCompressorStream(;level = -1), "table.csv.gz", "w")

however AFAIK that is not possible at the moment. This being said, I believe, that you can still construct and open the stream without the closure but with the named arguments.
I am still figuring this pipeline myself, therefore I cannot really give you the how to guide. I have stumbled across this topic and figured out the issue, so I just wanted to share what I know.