Hi,
I routinely use JLD2 to save the results of simulations to the disk on my university’s HPC. However, about 1 in 100 times, I get the following error:
ERROR: LoadError: IOError: stat("/storage/home/mvg6042/scratch/qps_2_5/data_40_particles_2_5_filling_factor_2_0_qps_68_chain_number.jld2"): Unknown system error -116 (Unknown system error -116)
Stacktrace:
[1] uv_error
@ ./libuv.jl:100 [inlined]
[2] stat(path::String)
@ Base.Filesystem ./stat.jl:152
[3] isdir
@ ./stat.jl:461 [inlined]
[4] checkpath_save(file::String)
@ FileIO ~/work/.julia/packages/FileIO/PtqMQ/src/loadsave.jl:173
[5] save(file::String, args::Dict{String, Any}; options::@Kwargs{})
@ FileIO ~/work/.julia/packages/FileIO/PtqMQ/src/loadsave.jl:126
[6] save
@ ~/work/.julia/packages/FileIO/PtqMQ/src/loadsave.jl:125 [inlined]
[7] gibbs_sampler(filename::var"#filename#20"{String, Int64, Int64, Int64, Int64, Int64}, chain_number::Int64, Qstar::Rational{Int64}, l_m_list::Vector{Tuple{Rational{Int64}, Rational{Int64}}}, p::Int64, num_thermalization::Int64, num_steps::Int64)
@ Main /storage/work/mvg6042/qps_2_5/sampler_single_state.jl:227
[8] sample_qps(folder_name::String, chain_number::Int64, N::Int64, n::Int64, p::Int64, num_qps_1::Int64, num_qps_2::Int64, num_thermalization::Int64, num_steps::Int64)
@ Main /storage/work/mvg6042/qps_2_5/sampler_single_state.jl:263
[9] top-level scope
@ /storage/work/mvg6042/qps_2_5/sampler_single_state.jl:308
in expression starting at /storage/work/mvg6042/qps_2_5/sampler_single_state.jl:299
I am unsure what causes this. The issue is sporadic, making it difficult to diagnose. I am using the following Julia version:
Julia Version 1.10.7
Commit 4976d05258e (2024-11-26 15:57 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 24 × Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, broadwell)
Threads: 1 default, 0 interactive, 1 GC (on 24 virtual cores)
And this is the relevant code-block which calls JLD2.save:
data["number of steps"] = monte_carlo_iter
data["acceptance rate"] = num_samples_accepted/monte_carlo_iter
data["monte carlo duration"] = time() - t0
data["pair densities"] = accumulated_pair_density ./ monte_carlo_iter
data["r grid"] = 0.50 .* (rgrid[1:end-1] .+ rgrid[2:end])
data["density"] = accumulated_density ./ monte_carlo_iter ./ Agrid
data["theta grid"] = 0.50 .* (θmesh[1:end-1] .+ θmesh[2:end])
save(filename(chain_number), data)
Here, data is a Dict.
Any insights into potential causes or debugging strategies would be greatly appreciated.