How to write in .parquet (or any compressed extension)

I’m doing some simulation that the output can be around 1GB to 10GB, it’s not so big but we are probably going to run lots of simulations. Right now we are saving the output in .txt, but I’d like to write it in a better extension and I thought about .parq. I’ve seen the Parquet.jl but I don’t think I understood how to write with it. Here is how I am saving my files today:

using DelimitedFiles

header = [
    "aux1 = "*string(aux1)*"; This is aux1 in units of a.u.1"
    "aux2 = "*string(aux2)*"; This is aux2 in units of a.u.2"
    "aux3 = "*string(aux3)*"; This is aux3 in units of a.u.3"

filename = "testing_file"
format = ".txt"
delimiter = ','

output1_vec = collect(1:1:100)
output2_vec = rand(Complex{Float64}, 100)
output3_vec = rand(100)*1e21
output4_vec = rand(100)*1e2
data = [output1_vec output2_vec output3_vec output4_vec] # in matrix form
data_label = ["vec1 [units of au1]", "vec2 [units of au2]", "vec3 [units of au3]", "vec4 [units of au4]"] # in vector form


open(filename*format; write=true) do f
    for i in header
        write(f, i*"\n")
    write(f, "----------------------------------------\n")
    for i in data_label
        write(f, i)
        if i != data_label[end]
            write(f, ",")
            write(f, "\n")
    writedlm(f, data, delimiter)

Which is basically a header and four columns with complex numbers. Any chance to put this in a more compressed extensions?

can your data be loaded as DataFrames?

The parquet writer isn’t very good in Parquet.jl, I know cos I wrote it.

But CSV.jl is ok also JDF.jl is ok for you data size I think.

For Parquet.jl just read the section on writing files I think ti’s something like

using Parquet

write_parquet("path/tofile.parquet", df)

After putting my data as DataFrames, I could save them as .parquet with no problem, really thanks! But why isn’t it very good? Im looking for compact format, writting speed won’t be a problem for now :slight_smile:
The only question that I still have is how to put a header in my file, like the one I shown in the example.

Too lazy to try and understand your example, but maybe try to look up the rename! function in DataFrames.jl. Suspect that is what u need.

It’s not very optimized, so it could be slow and doesn’t support everything e.g. datetime. I think someone is writing a Parquet2.jl

Ok. Parquet.jl is good as it has compression. JDF.jl also does compression.

Check out this article for comparisons of various formats:

If you need long-term stability, Parquet.jl is ok and JDF.jl is not yet stable for the long term but I use it since it’s quite fast and stable for my use case.