Hello,
I’m having problems reading hdf5 files using the ‘HDF5’ package, with nested dictionaries in them stored from Python. The data consists of a key (string) and its value (dict containing two keys for different arrays).
I saved it using deepdish , and I can read it fine in Python. However, when I read it in Julia using matches = h5open(file)
, the data is like:
HDF5.File: (read-only) Photo_SG_matches.h5
├─ CLASS
├─ DEEPDISH_IO_DEEPDISH_IO_UNPACK
├─ DEEPDISH_IO_VERSION
├─ PYTABLES_FORMAT_VERSION
├─ TITLE
├─ VERSION
└─ data
├─ CLASS
├─ PSEUDOATOM
├─ TITLE
└─ VERSION
And if I run read(matches, data)
it shows a 1-element vector of strings:
1-element Vector{Vector{UInt8}}:
[0x80, 0x04, 0x95, 0x43, 0x12, 0x01, 0x00, 0x00, 0x00, 0x00 … 0xd8, 0x23, 0x3f, 0x94, 0x74, 0x94, 0x62, 0x75, 0x75, 0x2e]
Am I missing something, or does Julia not support nested dicts?
Not an HDF5
expert, but it looks like the Python package employs a custom serialization of Python types, probably because HDF5
does not have a dictionary type.
Unless you find a better file format for which you know already that it can be also read with a package in Julia, you will have to check the Python library’s docs and figure how to make sense of their data
field.
Julia supports nested Dict
s:
julia> Dict(:a => Dict(:b => Dict(:c => "hello")))
Dict{Symbol, Dict{Symbol, Dict{Symbol, String}}} with 1 entry:
:a => Dict(:b=>Dict(:c=>"hello"))
Ah sorry, I meant to ask if the Julia HDF5 library supported nested dicts, not the language itself
According to the docs of HDF5.jl
it does not: Home · HDF5.jl
If you need a pkg to export Julia data types into HDF5
format you could try JLD2.jl
: GitHub - JuliaIO/JLD2.jl: HDF5-compatible file format in pure Julia
But again, this package employs a custom serialization because (I think) there is not dict type in the HDF5 specifications.
The supported Julia types for export are explained here: HDF5 Compatibility · Julia Data Format
Either way, I think there is (not yet) a packaged solution that allows you to write nested dicts in Python and then import them in Julia, or vice versa.
I guess you will have to write your own helper functions to get this done.
Also: If it is only about nested dicts that contain strings as values then your best bet might be to just use a .json
.
For python you could use the json
module and for Julia you could use JSON.jl
.
Thank you for the information. I guess for now, I will use PyCall and read in the hdf5 through the PyCall with the same library I used to save (deepdish). On trying that, it seems to work