Recommended Saves and Loads of DataFrame : JLD, CSV, etc

ivo_welch · February 15, 2018, 3:03am

I am using 0.6.2, but am planning to move to 1.0 when available.

I read in an earlier discussion (/suggested-formats-for-saving-and-serialization/3230) that JLD has not been working. I have tried to save() some dataframes to a jld file and then load() them, but this gave back junk.

I was also not clear about how to read and write dataframes in csv format. readtable and writetable seem to be working, but claim to be deprecated, and CSV was not working yet.

are these cases [a] where it is my problem, because I fail to understand how this should be done; or [b] is it better to wait a little bit for 1.0 before undertaking more experimentation.

advice appreciated.

/iaw

davidanthoff · February 15, 2018, 3:45am

There are many options One family of packages you can try are the ones I created: https://github.com/davidanthoff/CSVFiles.jl, https://github.com/davidanthoff/FeatherFiles.jl, https://github.com/davidanthoff/StatFiles.jl, https://github.com/davidanthoff/ExcelFiles.jl and https://github.com/davidanthoff/ParquetFiles.jl. All of them should work (and are tested nightly by a build bot), please do let me know if you run into any issues.

xiaodai · February 15, 2018, 6:30am

Didn’t know about ParquetFiles.jl! Thought that I can’t save ParquetFile.jl

davidanthoff · February 15, 2018, 6:58am

You still can’t, ParquetFiles is just the FileIO and iterable tables integration, under the hood it uses Parquet.jl, so no saving at this point.

pasha · February 15, 2018, 3:45pm

Feather is nice, fast, compact, portable, and type-safe for reading and writing.

I found that pre-DataFrames 0.11, CSV is in a bit of a Bardo land. At first I was hoping to use CSV.jl, but it couldn’t remove padding, so I went back to DataFrames.readtable(). That function, readtable(), is soon to be deprecated with 0.11, so everything will change in a few months, then become amazingly useful soon after.

Your strategy may be similar - do something now that works, until DataFrames 0.11 settles down and then “finalize” your method.

ivo_welch · February 15, 2018, 4:27pm

is there a clear path? pasha, I think you are telling me “not yet, hold on just a little longer.”

david, thanks for the links. I can probably use them for now. I would love to go official asap, though.

rant: I know it is early, but I am a bit annoyed that DataFrames, or something like it, is not a first-tier package that comes with and is officially supported by julia as the standard for handling, reading, and writing data sets. If it was just me, this would not be an issue. but I need to recommend and explain a software package to our students. in a sense, data sets are more important to my students than enumerators and a lot of other first-tier aspects of julia.

tbeason · February 15, 2018, 7:40pm

If you are only talking about DataFrames then I think writing them to CSV works quite well. I’ve rarely had problems there. I use CSV.write("filename.csv",df) frequently from CSV.jl on Julia v0.6.2. The text parsers do a pretty good job of correctly typing the columns.

If you are talking about custom types, then it is a bit more complicated. Feather and JLD2 should be working though.

hdavid16 · August 29, 2020, 4:28pm

CSV works great, but it has issues with large dataframes (see https://github.com/JuliaData/CSV.jl/issues/735#issue-688559890). JLD2 on the otherhand works great for me with large dataframes.

xiaodai · August 30, 2020, 3:36am

JDF.jl works great if you don’t need interop with Python or R

Topic		Replies	Views
[ANN] JDF.jl - Experimental Julia DataFrames serialization format Package Announcements	3	1428	January 19, 2020
Benchmarking ways to write/load DataFrames IndexedTables to disk Data	42	6957	October 25, 2018
JDF - an experimental DataFrame serialization format is ready for beta testing Data	8	2003	September 15, 2019
CSV Reader / Writer Choices Data	1	735	August 28, 2018
Dataframe functional input and output formats in julia 1.0, august 2018 Data	2	1344	August 21, 2018

Recommended Saves and Loads of DataFrame : JLD, CSV, etc

Related topics