Need data-storage package for 0.7 (JLD no longer working)


#1

I have been using the JLD package for quite some time (since 0.4) to save and checkpoint my data that is shared between Linux and Windows. Right now, neither JLD nor JLD2 is working for Julia 0.7 (Windows). I suppose I could open an issue, but I’m noticing that neither of these packages has been updated for months, so maybe the plan is to abandon JLD and switch to a new data storage format? Can someone advise me about the recommended platform-independent data format for 0.7?

In case any of the package maintainers is reading this, here are the errors I obtained (0.7.0-alpha, downloaded today; packages tried today).

For Pkg.add("JLD") the following error occurred:

┌ Error: Error building `Homebrew`; see log file for further info

By the way, could some helpful reader tell me which log file this message refers to?

Following the Pkg.add statement, I tried Pkg.test("JLD") and got the following error:

┌ Error: Error building `CodecZlib`; see log file for further info

(Same log file?) Then I received hundreds of warnings and this error:

ERROR: LoadError: LoadError: UndefVarError: SimpleVector not defined

Then I tried Pkg.add("JLD2"), which seemed to work, but Pkg.test("JLD2") gave dozens of warnings plus this error:

ERROR: LoadError: LoadError: syntax: local variable name "x" conflicts with an argument

#2

Julia 0.7-alpha has just been tagged. Devs will now begin to support 0.7, but it should take sometime. I would not expect anything before the beta. Finally, I am pretty sure JLD2 will be supported. Not being updated in months can also mean that the package is awesome the way it is :smile:


#3

Yeah, it’s way too early in the v0.7 release to decide that a given package is being abandoned.

In particular, the build issues you’re seeing are with Homebrew.jl and CodeZlib.jl, which are build and test dependencies of JLD, so the first thing to do is to start tracking down the issues in the dependencies. I think the path of the log file is printed in the Pkg3 output if you want to figure out what the full error message is.

The issue you’re seeing with JLD2 looks like a new one. I know for a fact that the authors have been working on v0.7 compatibility, but it’s been a moving target until now. Opening an issue would be the first step towards resolving the problem.


#4

It’s called v0.7-alpha. It’s your job to update JLD/JLD2. If you don’t want to update packages, don’t use the alpha.


#5

Thanks for the suggestions-- I have opened an issue (https://github.com/simonster/JLD2.jl/issues/74) for JLD2. I also tracked down the log file (thanks for the suggestion!) and have opened another issue for Homebrew (https://github.com/JuliaPackaging/Homebrew.jl/issues/233).

Is there an agreed-upon date or sequence of releases when “core” packages are supposed to be ready for 0.7.0? Although JLD is not in the standard library, it is a core piece of functionality for many users. For example, Matlab has provided its load and save commands at least since Matlab 3.5, around 1988, even before Matlab supported sparse matrices.

Also, there is some benefit to the community at large for ordinary users such as myself to be early adopters of new language versions. For example, my code uncovered a performance regression bug in an early version of 0.5 (arrays with >6 subscripts) and two performance regression bugs in early versions of 0.6 (negative of a sparse matrix; concatenating a sparse vector to a sparse matrix).


#6

Maybe ANN: BSON.jl, for saving your Julia data works.


#7

For your matlab example, keep in mind mathworks changed the .mat implementation a few times (inlc. non-backward compatible versions) and today it’s a variant of HDF5. The tricky thing is, julia supports a way broader range of data with composite types.


#8

Certainly — if they open issues, and make PRs. Just complaining about things will not help.

I suppose you should.

Even in a fast-moving package ecosystem, not making any changes for months is not a sign of abandonment. Neither is not updating a few days after an alpha of the next version comes out.

Alpha versions of open source software implicitly carry the expectation that the user is willing to get his/her hands dirty.


#9

I wanted to check (especially with JuliaCon underway and release dates for 0.7.0 and 1.0 around the corner) whether there is any news about JLD or JLD2? I am contributing a bit to some other parts of the Julia ecosystem, but I need either JLD or JLD2 for my own application project and neither seems to be working yet, at least not under Windows. I am not able to work on these packages myself-- they are too far from my expertise.


#10

I am in a similar situation, currently using mmap and serialize, depending on the data. Like some others, I made PRs for both BSON.jl and JLD2.jl, they have not prompted a reply for the last 2 weeks.


#11

I’ve tested JLD2 on 0.7 at some point and it was working, though that was a while ago now and I haven’t been using it regularly. I’m pretty sure the package devs intend for JLD to be deprecated in favor of JLD2, but I don’t know if and when they intend to make the change. I find the builtin serialize works quite well for most purposes, of course the problem is just binary incompatibility between Julia versions.

Indeed, it does seem responses to PR’s on JLD2 are rather slow.

@simonster, would you consider adding someone as an administrator on that package?


#12

As a stopgap measure, I usually install the PR’s forks directly, in this case from [1]:

dev https://github.com/JeffBezanson/JLD2.jl

Then in a terminal:

cd ~/.julia/dev/JLD2/
git checkout gdkrmr/0.7compat

(note that directly calling “dev https://github.com/JeffBezanson/JLD2.jl#gdkrmr/0.7compat” does no longer work)

Copy & pasting the examples from the JLD2 readme works then for me. (You might need to install FileIO as well).

[1] https://github.com/simonster/JLD2.jl/pull/91


#13

Thanks for the suggestion. In fact I tried this already, and

pkg> add https://github.com/JeffBezanson/JLD2.jl#gdkrmr/0.7compat

worked fine for me to install it directly.

That said, my experience with JLD2.jl and BSON.jl made me rethink my approach to long-term data storage using types directly from Julia.

First, I don’t think it is really suitable for long-term archiving of binary data with nontrivial types in the sense that HDF5 is designed for: reading becomes tricky when the representation changes, which is bound to happen. Then the advantage compared to serialize is not that clear. Because of this, now I think I should stick to plain vanilla HDF5 and think a bit more about converting my data to the types it supports.

Second, even if the first problem had a solution, I have concerns about the long-term support of these packages. With the migration to v0.7, it again became evident that packages maintained by a single person (or at most a few people) have a potentially very limited lifecycle: the original author may move on after a few months of interest, neglecting or even abandonning the package. This is in principle not a problem with most packages as one can fork and continue, but with packages like JLD2, this can be difficult. I am not claiming that JLD2 is abandonned (as far as I know, the maintaner just could be on a vacation, etc), but if that should happen, recovering old data a few years from now on would be a labor-intensive exercise.


#14

I’ve recently come to exactly the same conclusion. In addition to the reasons above I found JLD slow and JLD2 buggy.


#15

An HDF package that handles dense arrays would solve my problem-- it’s relatively straightforward for me to flatten my more complex data structures into arrays. What “plain vanilla HDF5” package are you using?


#16

#17

@Tamas_Papp: I have been switching to HDF5 only for exactly the same reason. It was a transition from julia0.4 to 0.5 (I think) where the JLD file did not work anymore.

@Stephen_Vavasis: I am storing complex numbers into HDF files using HDF5.jl although this is not supported regularly. This works pretty simple when using mmap. Writing is a little bit more complicated. Here is a link:
Writing: https://github.com/MagneticParticleImaging/MPIFiles.jl/blob/master/src/MPIFiles.jl#L360
Reading:
https://github.com/MagneticParticleImaging/MPIFiles.jl/blob/master/src/MDF.jl#L34

Unfortunately mmap is broken for this use case on Julia 1.0.

In general the support for compound data in HDF5 is not fully implemented. So there is definitely still some work to do but HDF5.jl is certainly a package which will stay.


#18

Hello. I just installed Julia v0.7 official version on my MacBook. I could add both JLD and JLD2 without problems (although it took for a while for JLD to be added since it somehow built HDF5 library from the source). Anyways, at this point of time, I can’t read/load my existing *.jld files that were previously generated with JLD under v0.6.x. The situation now is:
(1) using JLD gives the precompilation error reported somewhere, e.g., LoadError: UndefVarError: SimpleVector not defined So, I cannot use JLD under v0.7.
(2) using JLD2 went through (although generated a bunch of warnings), but when I tried to load my existing *.jld file, it complained as ArgumentError: only JLD2 files are presently supported
So, how can we load and read the existing *.jld files? Even if I eventually agree to convert my files to some other file format (e.g., genuine hdf5), we need to read and load those jld files. This is a serious issue in my opinion. At this point, the only solution seems to stick with v0.6.4 until these problems resolve.


#19

As indicated above this is an inherent problem of JLD and not Julia. If you want your types to be exactly like they were under julia 0.6 but these type changes from 0.6 to 0.7 you will not be able to retrieve them.

The best solution seems to convert your JLD file to plain HDF5 and then this will be loadable under 0.6 and 0.7.