Human-readable externalization for multi-dimensional arrays

  1. If you replace text editor with “some viewer” you should be fine. There are many other formats that are built on top of HDF5 (e.g. CGNS, NetCDF, Matlab’s standard .mat files,…), each bringing their own specific viewers, but each of these formats can be opened, read and debugged in hdfview, Python, Julia, …
  2. Ok, this wont be possible for HDF5, since the feature set is so extensive. This is due to HDF5 not being a format but an IO library, with lots of file system specific tuning.

I don’t have a particular gripe with HDF5, I am just concerned that if something breaks (which happens very commonly with software), I am not in a position to fix it.

I think HDF5 will be the lingua franca for scientific data for a long time. Even is predecessor HDF4 is still actively supported after 20 years on the market. The ecosystem and compatibility coming with HDF5 is so widespread that you should actually have good reasons to NOT use it, see Wikipedia. (yes I know I may sound like a salesman)

BTW: If you’re really forced to use plain text you could try HDF5-JSON, however without the advantage of ubiquitous reader support.

Sure, my problem is that these viewers mostly rely on a single C library, which is a relatively opaque piece of software implementing a huge spec maintained by a third party that technically publishes the source, but operates very differently from most contemporary open source projects.

Eg suppose that two years after writing some data, I cannot read it back (eg this issue). Even investigating whether this is a HDF5.jl or a libhdf5 bug is tricky. I could not even figure out where the bug tracker for HDF5 is, they just ask that you send an e-mail, so how should I learn if someone else had this issue, if there is a workaround, etc?

Have you looked into ASDF? ASDF - Advanced Scientific Data Format — asdf v2.13.1.dev13+gbf954a3
It’s based on YAML and supports having arrays stored as binary. It’s planned to be used as the format for the James Webb Space Telescope, so I think it is likely to gain users over time.

There is https://github.com/eschnett/ASDF.jl that uses PyCall to call the only full implementation in Python, but it looks pretty straightforward to implement in other languages.

1 Like

Access to the Jira bugtracker described right here:

I think this is becoming a discussion on personal preferences. Many of the big players put sufficient trust in HDF5 to make it their standard format.
Though not a hipster-ish project HDFGroup has been publishing robust, well-maintained, open-source software for the last decades and they will most certainly continue doing this for a long time.