I would like to save a large array on disk using if possible some type of data compression to save disk space. Then, I would like to read the file in Julia for future usage. My attempt below using JLD2 works only for small arrays. Please, do you have any suggestions how to proceed?
Here is my attempt:
using FileIO, JLD2
# let's say I have a large array like
X = rand(Float32, 32, 256, 168740)
# In my case, some identical values appear often
nr, nc, nb = size(X)
X[21:end,:,:] .= rand(rand(5), 12, nc, nb) # good for data compression?
I would like to save this array and load it again using Julia
Without data compression this saves a file of about ~ 5.53GB
FileIO.save("mydata.jld2", "X", X)
And I can sucessfully load my data back when I need it
X = FileIO.load("mydata.jld2","X")
However, I was hopping to save some disk space using compression
FileIO.save("mydata_compressed.jld2", "X", X, compress=true)
But this crashes with:
Error encountered while save File{DataFormat{:JLD2}, String}("mydata_compressed.jld2").
Fatal error:
ERROR: InexactError: trunc(UInt32, 5529272320)
Stacktrace:
[1] throw_inexacterror(f::Symbol, #unused#::Type{UInt32}, val::UInt64)
@ Core ./boot.jl:612
[2] checked_trunc_uint
@ ./boot.jl:642 [inlined]
[3] toUInt32
@ ./boot.jl:731 [inlined]
[4] UInt32
@ ./boot.jl:766 [inlined]
[5] convert
@ ./number.jl:7 [inlined]
[6] setproperty!
@ ./Base.jl:43 [inlined]
[7] process(codec::CodecZlib.ZlibCompressor, input::TranscodingStreams.Memory, output::TranscodingStreams.Memory, error::TranscodingStreams.Error)
@ CodecZlib ~/.julia/packages/CodecZlib/ruMLE/src/compression.jl:172
[8] transcode(codec::CodecZlib.ZlibCompressor, data::Vector{UInt8})
@ TranscodingStreams ~/.julia/packages/TranscodingStreams/IVlnc/src/transcode.jl:90
[9] deflate_data(f::JLD2.JLDFile{JLD2.MmapIO}, data::Array{Float32, 3}, odr::Type{Float32}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}, compressor::CodecZlib.ZlibCompressor)
@ JLD2 ~/.julia/packages/JLD2/MXv8x/src/compression.jl:146
[10] write_compressed_data(cio::JLD2.MmapIO, f::JLD2.JLDFile{JLD2.MmapIO}, data::Array{Float32, 3}, odr::Type, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}, filter_id::UInt16, compressor::CodecZlib.ZlibCompressor)
@ JLD2 ~/.julia/packages/JLD2/MXv8x/src/compression.jl:182
[11] write_dataset(f::JLD2.JLDFile{JLD2.MmapIO}, dataspace::JLD2.WriteDataspace{3, Tuple{}}, datatype::JLD2.FloatingPointDatatype, odr::Type{Float32}, data::Array{Float32, 3}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}, compress::Bool)
@ JLD2 ~/.julia/packages/JLD2/MXv8x/src/datasets.jl:404
[12] write_dataset(f::JLD2.JLDFile{JLD2.MmapIO}, x::Array{Float32, 3}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}, compress::Bool)
@ JLD2 ~/.julia/packages/JLD2/MXv8x/src/inlineunion.jl:44
[13] write_dataset(f::JLD2.JLDFile{JLD2.MmapIO}, x::Array{Float32, 3}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}})
@ JLD2 ~/.julia/packages/JLD2/MXv8x/src/inlineunion.jl:36
[14] write(g::JLD2.Group{JLD2.JLDFile{JLD2.MmapIO}}, name::String, obj::Array{Float32, 3}, wsession::JLD2.JLDWriteSession{Dict{UInt64, JLD2.RelOffset}}; compress::Nothing)
@ JLD2 ~/.julia/packages/JLD2/MXv8x/src/compression.jl:87
[15] #write#87
@ ~/.julia/packages/JLD2/MXv8x/src/compression.jl:71 [inlined]
[16] write
@ ~/.julia/packages/JLD2/MXv8x/src/compression.jl:71 [inlined]
[17] (::JLD2.var"#68#69"{String, Array{Float32, 3}, Tuple{}})(file::JLD2.JLDFile{JLD2.MmapIO})
@ JLD2 ~/.julia/packages/JLD2/MXv8x/src/fileio.jl:26
[18] jldopen(::Function, ::String, ::Vararg{String}; kws::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:compress,), Tuple{Bool}}})
@ JLD2 ~/.julia/packages/JLD2/MXv8x/src/loadsave.jl:4
[19] #fileio_save#67
@ ~/.julia/packages/JLD2/MXv8x/src/fileio.jl:24 [inlined]
[20] invokelatest(::Any, ::Any, ::Vararg{Any}; kwargs::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:compress,), Tuple{Bool}}})
@ Base ./essentials.jl:718
[21] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::FileIO.Formatted, ::String, ::Vararg{Any}; options::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:compress,), Tuple{Bool}}})
@ FileIO ~/.julia/packages/FileIO/u9YLx/src/loadsave.jl:219
[22] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::Symbol, ::String, ::String, ::Vararg{Any}; options::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:compress,), Tuple{Bool}}})
@ FileIO ~/.julia/packages/FileIO/u9YLx/src/loadsave.jl:185
[23] #save#20
@ ~/.julia/packages/FileIO/u9YLx/src/loadsave.jl:129 [inlined]
[24] top-level scope
@ REPL[46]:1
Stacktrace:
[1] handle_error(e::InexactError, q::Base.PkgId, bt::Vector{Union{Ptr{Nothing}, Base.InterpreterIP}})
@ FileIO ~/.julia/packages/FileIO/u9YLx/src/error_handling.jl:61
[2] handle_exceptions(exceptions::Vector{Tuple{Any, Union{Base.PkgId, Module}, Vector}}, action::String)
@ FileIO ~/.julia/packages/FileIO/u9YLx/src/error_handling.jl:56
[3] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::FileIO.Formatted, ::String, ::Vararg{Any}; options::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:compress,), Tuple{Bool}}})
@ FileIO ~/.julia/packages/FileIO/u9YLx/src/loadsave.jl:228
[4] action(::Symbol, ::Vector{Union{Base.PkgId, Module}}, ::Symbol, ::String, ::String, ::Vararg{Any}; options::Base.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:compress,), Tuple{Bool}}})
@ FileIO ~/.julia/packages/FileIO/u9YLx/src/loadsave.jl:185
[5] #save#20
@ ~/.julia/packages/FileIO/u9YLx/src/loadsave.jl:129 [inlined]
[6] top-level scope
@ REPL[46]:1
Note: In case of smaller arrays the compression/loading above works fine. The problems seems to be the size of the array
Please, how could I solve this?
I am using Julia 1.7.2 on MacOS, FileIO v1.13.0, JLD2 v0.4.21
Thank you!