Append to zipped CSV file

Hello,

I have a program that outputs numerical data. Currently I’m appending into a CSV file so that the progress is saved even if the program is suddenly terminated. Please find the MWE below.

How could I output to an zipped CSV file? I have looked at various packages but could no make it work.

using DelimitedFiles

open("output.csv", write=true, truncate=true) do io
end

x = zeros(100)
for sdx = 1:1000
    # Compute x .... takes minutes.
    x .= randn(100)

   open("output.csv", write=true, append=true) do io
        writedlm(io, vec(x)', ',')
    end
end

I think that gz supports appending:

  1. you just open a stream to append to a file,
  2. make a gzip output stream, eg with CodecZlib.jl,
  3. write to that stream.

Make sure to test this for critical data though.

1 Like

Thank you for pointing me in the right direction!

using CodecZlib
using DelimitedFiles

open("output.csv.gz", write=true, truncate=true) do io                                                                                             
end                                                                                                                                              

x = zeros(100)
for sdx = 1:1000
    # Compute x .... takes minutes.
    x .= randn(100)

    fout = open("output.csv.gz", write=true, append=true)
    fout = GzipCompressorStream(fout)
    writedlm(fout, vec(x)', ',')
    close(fout)
end
2 Likes

thanks!

Another options is to output to a binary format like JDF, Parquet or feather via Parquet.jl and Feather.jl. You will get better features like selectively load columns and compressed file size (for parquet and JDF).