Serializing nested Dicts (or DataFrames) so that they can (easily) be loaded in Python as well?

From my simulations, I typically end up with (somewhat convoluted) Dicts and/or DataFrames made from these dicts that contain my input parameters as well as my results. Depending on the complexity of the results, they might live in their own Dict, so a nested Dict.

I can serialize these objects with JLD2 without an issue and reload them into Julia. However, while I like Julia, quickly fiddling around with and visualizing data often feels a bit less cumbersome in Python. I would love the option to just load these serialized results there as well.

While loading jld2 files via h5py principally works, the result is pretty messy in terms of how the data is structured. I assume this is because HDF5 can only save uniform arrays and the jld2 needs to find its way around this.

For the same reason, directly saving dicts in HDF5 is not possible. Are there any other cross-language formats that might be viable or are there tool that might help me in converting my dicts to something that makes more sense in HDF5?

Hi @daharn,

as you note yourself, this is not possible straight-away.
JLD2 can not know whether to store your nested dicts in a HDF5 compatible way (losing type information) or not.

Here’s a workaround:

julia> using JLD2

julia> data = Dict("a" => 1,
       "b/c" => 2,
       "b/d" => 3,
       "b/e/f" => 4)
Dict{String, Int64} with 4 entries:
  "b/d"   => 3
  "b/e/f" => 4
  "b/c"   => 2
  "a"     => 1

julia> save("test.jld2", data)

julia> f = jldopen("test.jld2")
JLDFile /home/isensee/test.jld2 (read-only)
 β”œβ”€πŸ”’ a
 β””β”€πŸ“‚ b
    β”œβ”€πŸ”’ d
    β”œβ”€πŸ”’ c
    β””β”€πŸ“‚ e
       β””β”€πŸ”’ f

You should probably be able to write a simple function to β€œunroll” your nested data in this way.
At top-level, the API accepts a String-key Dictionary.

For your purpose it could be easier to use the json format as an intermediary to take advantage of the available table writers/readers in both languages.
DataFrame + Dict β†’ JSONTables.jl β†’ JSON β†’ Pandas

Indeed, going via JSON works absolutely flawless for both dataframes and dicts! Any binary format (BSON, JLD2 or HDF5) rendered the reloaded data irredeemably convoluted in python but JSON preserved the whole structure.

Is there a reason why it is not included in the FileIO.jl framework per default?