Hi all!
I’ve put in some work to bring JLD2
back up to speed and I am happy to announce JLD2 v0.2.0
!
This release covers a handful of changes / bugfixes:
-
You can now store
Union
types that haveUnionAll
fields e.g.Union{Int, Vector}
(#206) -
Previously immutable structs that contained references to objects of the same type could not be stored. This is now possible (#196)
-
CI on AppVeyor and CodeCoverage work again ( thanks in part to @lhupe )
-
New Magic Bytes that differ more strongly from JLD (#213)
-
No longer list FileIO as a dependency but rather use Requires to load the code on import (#217 @lhupe)
New Features
- A new and improved saving macro syntax (#198)
# Option passing:
hello = "world"
@save "test.jld2" {compress=true} hello
@save "test.jld2" {compress=true, iotype=IOStream} hello
# Assignment syntax
@save "test.jld2" bye=hello
@save "test.jld2" hello randomnumber=rand(10)
-
Better error messages and handling (#225 @lhupe):
We no longer try to open any given string as a path but instead check whether the string represents a valid file path first to give better error messages. Additionally, if the standard IO typeMmapIO
fails for some reason, we attempt to open the file withIOStream
instead by default. -
Inline Union Arrays (#221)
Arrays withUnion
eltype where theUnion
has twoisbits
member types are now stored inline in an interleaved fashion. This makes storing e.g.Vector{Union{Missing, Float64}}
a lot more efficient AND allows for compression!
Here some attempts at storing very boring data - maximally compressible.
Summary
julia> @time using JLD2
[ Info: Precompiling JLD2 [033835bb-8acc-5ee8-8aae-3f567f8a3819]
4.090665 seconds (2.15 M allocations: 114.999 MiB, 0.28% gc time)
julia> u = Union{Float64, Missing}[zeros(10^6);];
julia> @time @save "test.jld2" u
4.864063 seconds (12.90 M allocations: 650.482 MiB, 5.46% gc time)
julia> @time @save "test.jld2" u
0.063690 seconds (5.20 k allocations: 15.539 MiB)
julia> @time @save "testcompressed.jld2" {compress=true} u
0.117254 seconds (5.64 k allocations: 32.728 MiB, 5.53% gc time)
julia> using JLD
julia> u = Union{Float64, Missing}[zeros(10^6);];
julia> @time @save "test.jld" u
61.623914 seconds (18.44 M allocations: 724.206 MiB, 0.41% gc time)
julia> @time @save "test.jld" u
58.175550 seconds (16.00 M allocations: 602.762 MiB, 0.37% gc time)
julia> @time JLD.jldopen(f->f["u"]=u, "testcompressed.jld", "w", compress=true)
65.954995 seconds (23.46 M allocations: 973.688 MiB, 0.74% gc time)
julia> using BSON
julia> u = Union{Float64, Missing}[zeros(10^6);];
julia> @time BSON.@save "test.bson" u
1.511677 seconds (11.02 M allocations: 418.530 MiB, 26.47% gc time)
julia> @time using Serialization
0.000514 seconds (624 allocations: 40.344 KiB)
julia> @time serialize("juliaserializer", u)
0.488430 seconds (1.91 M allocations: 65.454 MiB, 1.83% gc time)
File sizes when storing Union{Float64, Missing}[zeros(10^6);]
- Serialization: 8,6M (~0.4s)
- BSON: 16M (~1.4s)
- JLD: 300M / 300 M (~58s)
- JLD2: 8,6M / 23K (uncompressed / compressed) (~5s but second time ~0.1s)
Sure, we didn’t actually store any interesting data here but still,
no one else seems to be able to compress isbits union arrays.
Now the same again but with really no data
File sizes when storing Union{Float64, Missing}[missing for i=1:10^6]
- Serialization: 3.9M
- BSON: 126M
- JLD: 307M
- JLD2: 8,6M / 14K
Surprisingly the file size is much larger for BSON when the array consists of missing
s only.
Serialization
outputs a smaller file though.
( An apology to the other libraries: I’m aware that this comparison is entirely unfair as it was specifically designed to highlight this particular feature. The applicability and advantage of this will vary and definitely be smaller in real world applications. )
Remarks on Compatibility
This release contains some breaking changes in the file format. However, care was taken that files written with older versions of JLD2
can still be read! If you find yourself unable to read older files, please report an issue.
In the same way it is not unlikely that there will be more changes to the format in the future but
I am hopeful that I won’t have to break the ability reading old files.
Best,
Jonas