How to save a large Float32 array on disk using data compression (failed attempt with JLD2)?

I would like to save a large array on disk using if possible some type of data compression to save disk space. Then, I would like to read the file in Julia for future usage. My attempt below using JLD2 works only for small arrays. Please, do you have any suggestions how to proceed?

Here is my attempt:

using FileIO, JLD2

# let's say I have a large array like
X = rand(Float32, 32, 256, 168740)

# In my case, some identical values appear often 
nr, nc, nb = size(X)
X[21:end,:,:] .= rand(rand(5), 12, nc, nb)  # good for data compression?

I would like to save this array and load it again using Julia

Without data compression this saves a file of about ~ 5.53GB

FileIO.save("mydata.jld2", "X", X)  

And I can sucessfully load my data back when I need it

X = FileIO.load("mydata.jld2","X")

However, I was hopping to save some disk space using compression

FileIO.save("mydata_compressed.jld2", "X", X, compress=true)

But this crashes with:


Error encountered while save File{DataFormat{:JLD2}, String}("mydata_compressed.jld2").

    Fatal error:
    ERROR: InexactError: trunc(UInt32, 5529272320)
    Stacktrace:
      [1] throw_inexacterror(f::Symbol, #unused#::Type{UInt32}, val::UInt64)
        @ Core ./boot.jl:612
      [2] checked_trunc_uint
        @ ./boot.jl:642 [inlined]
      [3] toUInt32
        @ ./boot.jl:731 [inlined]
      [4] UInt32
        @ ./boot.jl:766 [inlined]
      [5] convert
        @ ./number.jl:7 [inlined]
      [6] setproperty!
        @ ./Base.jl:43 [inlined]
      [7] process(codec::CodecZlib.ZlibCompressor, input::TranscodingStreams.Memory, output::TranscodingStreams.Memory, error::TranscodingStreams.Error)
        @ CodecZlib ~/.julia/packages/CodecZlib/ruMLE/src/compression.jl:172
      [8] transcode(codec::CodecZlib.ZlibCompressor, data::Vector{UInt8})
        @ TranscodingStreams ~/.julia/packages/TranscodingStreams/IVlnc/src/transcode.jl:90
      [9] deflate_data(f::JLD2.JLDFile{JLD2.MmapIO}, data::Array{Float32, 3}, odr::Type{Float32}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}, compressor::CodecZlib.ZlibCompressor)
        @ JLD2 ~/.julia/packages/JLD2/MXv8x/src/compression.jl:146
     [10] write_compressed_data(cio::JLD2.MmapIO, f::JLD2.JLDFile{JLD2.MmapIO}, data::Array{Float32, 3}, odr::Type, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}, filter_id::UInt16, compressor::CodecZlib.ZlibCompressor)
        @ JLD2 ~/.julia/packages/JLD2/MXv8x/src/compression.jl:182
     [11] write_dataset(f::JLD2.JLDFile{JLD2.MmapIO}, dataspace::JLD2.WriteDataspace{3, Tuple{}}, datatype::JLD2.FloatingPointDatatype, odr::Type{Float32}, data::Array{Float32, 3}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}, compress::Bool)
        @ JLD2 ~/.julia/packages/JLD2/MXv8x/src/datasets.jl:404
     [12] write_dataset(f::JLD2.JLDFile{JLD2.MmapIO}, x::Array{Float32, 3}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}, compress::Bool)
        @ JLD2 ~/.julia/packages/JLD2/MXv8x/src/inlineunion.jl:44
     [13] write_dataset(f::JLD2.JLDFile{JLD2.MmapIO}, x::Array{Float32, 3}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}})
        @ JLD2 ~/.julia/packages/JLD2/MXv8x/src/inlineunion.jl:36
     [14] write(g::JLD2.Group{JLD2.JLDFile{JLD2.MmapIO}}, name::String, obj::Array{Float32, 3}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}; compress::Nothing)
        @ JLD2 ~/.julia/packages/JLD2/MXv8x/src/compression.jl:87
     [15] #write#87
        @ ~/.julia/packages/JLD2/MXv8x/src/compression.jl:71 [inlined]
     [16] write
        @ ~/.julia/packages/JLD2/MXv8x/src/compression.jl:71 [inlined]
     [17] (::JLD2.var"#68#69"{String, Array{Float32, 3}, Tuple{}})(file::JLD2.JLDFile{JLD2.MmapIO})
        @ JLD2 ~/.julia/packages/JLD2/MXv8x/src/fileio.jl:26
     [18] jldopen(::Function, ::String, ::Vararg{String}; kws::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:compress,), Tuple{Bool}}})
        @ JLD2 ~/.julia/packages/JLD2/MXv8x/src/loadsave.jl:4
     [19] #fileio_save#67
        @ ~/.julia/packages/JLD2/MXv8x/src/fileio.jl:24 [inlined]
     [20] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:compress,), Tuple{Bool}}})
        @ Base ./essentials.jl:718
     [21] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::FileIO.Formatted, ::String, ::Vararg{Any}; options::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:compress,), Tuple{Bool}}})
        @ FileIO ~/.julia/packages/FileIO/u9YLx/src/loadsave.jl:219
     [22] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::Symbol, ::String, ::String, ::Vararg{Any}; options::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:compress,), Tuple{Bool}}})
        @ FileIO ~/.julia/packages/FileIO/u9YLx/src/loadsave.jl:185
     [23] #save#20
        @ ~/.julia/packages/FileIO/u9YLx/src/loadsave.jl:129 [inlined]
     [24] top-level scope
        @ REPL[46]:1
    Stacktrace:
     [1] handle_error(e::InexactError, q::Base.PkgId, bt::Vector{Union{Ptr{Nothing}, Base.InterpreterIP}})
       @ FileIO ~/.julia/packages/FileIO/u9YLx/src/error_handling.jl:61
     [2] handle_exceptions(exceptions::Vector{Tuple{Any, Union{Base.PkgId, Module}, Vector}}, action::String)
       @ FileIO ~/.julia/packages/FileIO/u9YLx/src/error_handling.jl:56
     [3] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::FileIO.Formatted, ::String, ::Vararg{Any}; options::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:compress,), Tuple{Bool}}})
       @ FileIO ~/.julia/packages/FileIO/u9YLx/src/loadsave.jl:228
     [4] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::Symbol, ::String, ::String, ::Vararg{Any}; options::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:compress,), Tuple{Bool}}})
       @ FileIO ~/.julia/packages/FileIO/u9YLx/src/loadsave.jl:185
     [5] #save#20
       @ ~/.julia/packages/FileIO/u9YLx/src/loadsave.jl:129 [inlined]
     [6] top-level scope
       @ REPL[46]:1

Note: In case of smaller arrays the compression/loading above works fine. The problems seems to be the size of the array

Please, how could I solve this?

I am using Julia 1.7.2 on MacOS, FileIO v1.13.0, JLD2 v0.4.21
Thank you!

Use HDF5 directly: Home · HDF5.jl

You’re encountering a bug in CodecZlib.jl there’s a fix that hasn’t been merged New feature: compressor can take streams larger than typemax(UInt32) bytes by felixhorger · Pull Request #62 · JuliaIO/CodecZlib.jl · GitHub .

Your could also try a different compression library as described in the JLD2 docs.