Recommended Saves and Loads of DataFrame : JLD, CSV, etc

I am using 0.6.2, but am planning to move to 1.0 when available.

I read in an earlier discussion (/suggested-formats-for-saving-and-serialization/3230) that JLD has not been working. I have tried to save() some dataframes to a jld file and then load() them, but this gave back junk.

I was also not clear about how to read and write dataframes in csv format. readtable and writetable seem to be working, but claim to be deprecated, and CSV was not working yet.

are these cases [a] where it is my problem, because I fail to understand how this should be done; or [b] is it better to wait a little bit for 1.0 before undertaking more experimentation.

advice appreciated.

/iaw

There are many options :slight_smile: One family of packages you can try are the ones I created: https://github.com/davidanthoff/CSVFiles.jl, https://github.com/davidanthoff/FeatherFiles.jl, https://github.com/davidanthoff/StatFiles.jl, https://github.com/davidanthoff/ExcelFiles.jl and https://github.com/davidanthoff/ParquetFiles.jl. All of them should work (and are tested nightly by a build bot), please do let me know if you run into any issues.

Didn’t know about ParquetFiles.jl! Thought that I can’t save ParquetFile.jl

You still can’t, ParquetFiles is just the FileIO and iterable tables integration, under the hood it uses Parquet.jl, so no saving at this point.

Feather is nice, fast, compact, portable, and type-safe for reading and writing.

I found that pre-DataFrames 0.11, CSV is in a bit of a Bardo land. At first I was hoping to use CSV.jl, but it couldn’t remove padding, so I went back to DataFrames.readtable(). That function, readtable(), is soon to be deprecated with 0.11, so everything will change in a few months, then become amazingly useful soon after.

Your strategy may be similar - do something now that works, until DataFrames 0.11 settles down and then “finalize” your method.

is there a clear path? pasha, I think you are telling me “not yet, hold on just a little longer.”

david, thanks for the links. I can probably use them for now. I would love to go official asap, though.

rant: I know it is early, but I am a bit annoyed that DataFrames, or something like it, is not a first-tier package that comes with and is officially supported by julia as the standard for handling, reading, and writing data sets. If it was just me, this would not be an issue. but I need to recommend and explain a software package to our students. in a sense, data sets are more important to my students than enumerators and a lot of other first-tier aspects of julia.

1 Like

If you are only talking about DataFrames then I think writing them to CSV works quite well. I’ve rarely had problems there. I use CSV.write("filename.csv",df) frequently from CSV.jl on Julia v0.6.2. The text parsers do a pretty good job of correctly typing the columns.

If you are talking about custom types, then it is a bit more complicated. Feather and JLD2 should be working though.

CSV works great, but it has issues with large dataframes (see https://github.com/JuliaData/CSV.jl/issues/735#issue-688559890). JLD2 on the otherhand works great for me with large dataframes.

JDF.jl works great if you don’t need interop with Python or R

1 Like