Better compression for JLD2 (Discussion)

Hi all,

I’m working on bringing faster / more flexible compression to JLD2.

Currently you can do @save "test.jld2" {compress=true} a b c
and JLD2 will then compress sufficiently large arrays using CodecZlib.
This works reasonably well but it gets slow for large datasets and there are
faster compression algorithms available such as Blosc.jl and CodecLz4.jl.

As an improvement, I think it would be neat to change the default compression algorithm
but also let the user pass which algorithm should be used.

My current best idea is to allow passing Symbols as arguments to the calls e.g.

@save "test.jld2" {compress=:lz4}
jldopen("test.jld2", "w"; compress=:blosc) do .... end

but this does not grant full access to all features of the compression libraries.

Another question concerns the libraries themselves:
Do I add all options as dependencies?
Currently, I’m trying to dynamically load them similar to what FileIO does
but I’m running into worldage problems.

ERROR: MethodError: no method matching CodecLz4.LZ4FrameCompressor()
The applicable method may be too new: running in world age 27829, while current world is 27830.
Closest candidates are:
  CodecLz4.LZ4FrameCompressor(; kwargs...) at /home/jonas/.julia/packages/CodecLz4/2JFgC/src/frame_compression.jl:30 (method too new to be called from this world context.)

The code is at https://github.com/JuliaIO/JLD2.jl/pull/264

What are your thoughts?
Better Suggestions for the API?
Any experience with world-age problems?

Opinions on dependencies vs. dynamic loading?

Best,
Jonas

PS:

None of this will break reading old files!

I had a similar problem in TableIO.
In the end, I did the following:

  1. With Requires.jl, the code parts of my package depending on the optional dependencies are only loaded if they are imported.
  2. If a method needs an optional dependency, it imports the corresponding package dynamically (see https://github.com/lungben/TableIO.jl/blob/175a12394d743d302ac53cb0b38bdafbcbe08d29/src/TableIO.jl#L50).

This is a bit “hacky” but works.

Long term, this would be great: https://github.com/JuliaLang/Pkg.jl/issues/1285

2 Likes

A structure that holds all the options, eg BLOSC(...), with

jldopen("test.jld2", "w"; compress = BLOSC(...)) do ... end

You will also be able to dispatch on this directly. A lot of Julia packages use this kind of API.

I second @lungben about Requires.jl — loading each compression library would just execute the conditional code that defines these structures and their implementation.