May someone tell me please how to export a 6G dataframe with 200 million rows and 4 columns? I am using the CSV.write function of the CSV package, but it just exports part of the table. I think it is because the dataframe is too big. Any ideas on how to deal with it? I guess that what I need to do is open a file and write on it, but I do not know how to do it with a dataframe. I really appreciate any help.
Hi @rafael.guerra,
Thank you for your reply. I do not see any compress keyword argument in the documentation and compress = true produces an error. Do you have more information?
I use Arrow.jl to store very large DataFrames (500 million+) rows. The Arrow record batches feature is also pretty useful in processing very large data sets.
Tested code below in Win10 Julia 1.7 for a dataframe with 200 million rows and 4 columns with Float64.
It took 2-3 min and 15 GB of disk space, in standard csv writing mode (with no compression).
However, compression seems to be very slow for this type of large size random input data. Same slowness using 7Zip, for instance. Compression seems to take >20 min to complete (aborted it as no patience), and it might achieve only ~50% compression. Tbc.
NB:for smaller random data dataframe, CSV.jl gzip achieved a bit better than 50% compression.
Thank you @rafael.guerra,
I updated Julia to 1.7 and CSV package. After that, a much larger file was exported, but it still did not have all the rows. Maybe there is a limitation resources as I am working in a laptop?