Suggested formats for saving and serialization

ChrisRackauckas · April 15, 2017, 10:10am

I was wondering what the suggested formats for saving and serializing are these days. I am really confused about the landscape of DataTables, DataFrames, databases, etc. I was hoping to write a few functions for the DiffEq solution type to save to some common data formats, but am not sure what I should be targeting. It should be something that would work well with statistics, plotting, and machine learning libraries. Or maybe the approach is just to go generic: I know there are things like DataStreams which are “independent readers”, is there something reverse that I can target so the user can choose which data type they want out? Would that even be necessary? I am hoping someone could guide me in the right direction here. Thanks!

For completeness, I opened an issue on DifferentialEquations.jl related to this topic (and it shows how idea-less I am, except I have had requests for something of this nature):

https://github.com/JuliaDiffEq/DifferentialEquations.jl/issues/151

Thanks in advance for any ideas.

Also, is there by any chance a form of serialization for types which hold functions? I know JLD hasn’t worked since v0.5, and am wondering if there’s anything along these lines.

stevengj · April 15, 2017, 11:32am

csv and HDF5 are probably the most widespread open formats for numerical data.

nalimilan · April 15, 2017, 12:29pm

DataStreams are not just for reading, they are also for writing. I would suggest implementing a DataStream Source a let the user choose what output format they want to use (possibly with a default format if you want).

ChrisRackauckas · April 17, 2017, 1:49pm

Are there any examples for how to setup a DataStream Source anywhere? Are there examples how how to take an arbitrary source and write it to a DataFrame?

nalimilan · April 17, 2017, 4:04pm

You wouldn’t deal with DataFrame at all, you would just implement the Source interface, and the code living in DataFrames would take care of creating the object. For an example, you can have a look at the DataFrames code implementing a Source: https://github.com/JuliaStats/DataFrames.jl/pull/1174 CSV.jl is another possibly useful example.

ChrisRackauckas · April 17, 2017, 4:10pm

I know I don’t need to write that code. But I was wondering what the code looks like for generating a DataFrame from an arbitrary Source.

davidanthoff · April 17, 2017, 7:12pm

I just put together a quick and dirty integration with IterableTables in this PR https://github.com/davidanthoff/IterableTables.jl/pull/22. With that you can easily convert a DESolution into any of the supported table sink types, e.g. things like DataTable(sol) will work to create a DataTable from a DESolution instance. Essentially you get support for all the sinks that are listed in the README, plus of course full Query integration, i.e. one can easily run queries against a DESolution instance. You also get integration with DataStreams from that “for free”: you can use IterableTables.get_datastreams_source(sol) to create a DataStreams.Source from your solution (I’m still trying to figure out an easier way to handle that particular integration from a user point of view).

I’m currently just waiting that the package to be registered: https://github.com/JuliaLang/METADATA.jl/pull/8878. And then we would have to clean out that PR a bit more before I could merge it.

ChrisRackauckas · April 17, 2017, 7:22pm

That sounds great! The API looks very simple too. That looks like the solution I was needing. Thanks! I’ll comment on the PR

davidanthoff · April 17, 2017, 7:23pm

Don’t say that, it will just discourage me from writing the documentation I really should be writing for this

Topic		Replies	Views
How to handle and store large amounts of (distributed) generated data? Data	4	1447	April 15, 2017
[ANN] JDF.jl - Experimental Julia DataFrames serialization format Package Announcements	3	1428	January 19, 2020
Save and restore DataFrame, and serialize()/deserialize() General Usage	13	5366	September 13, 2019
Dataframe functional input and output formats in julia 1.0, august 2018 Data	2	1343	August 21, 2018
[ANN] JDF.jl v0.2.0 - Julia DataFrames serialization format Package Announcements	11	1028	May 19, 2020

Suggested formats for saving and serialization

Related topics