Hi. As title suggests, my application crashes occasionally when call delete_object.
Every time I try, the error message changes. Sometimes
free(): invalid pointer
malloc(): unaligned tcache chunk detected
Aborted (core dumped)
or sometimes
double free or corruption (!prev)
[3916181] signal 6 (-6): Aborted
I suspect GC might be doing something bad.
MWE
h5open(file, "cw") do h
if haskey(h, key)
delete_object(h, key)
end
write(h, key, data)
end
versioninfo
Julia Version 1.11.2
Commit 5e9a32e7af2 (2024-12-01 20:02 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 384 × AMD EPYC 9654 96-Core Processor
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, znver4)
Threads: 16 default, 0 interactive, 8 GC (on 384 virtual cores)
LocalPreference
[CUDA_Runtime_jll]
local = "true"
version = "12.3"
[HDF5]
libhdf5 = ".../free/hdf5-parallel/1.14.3/nvhpc24.1/openmpi5.0.2/lib/libhdf5.so"
libhdf5_hl = ".../free/hdf5-parallel/1.14.3/nvhpc24.1/openmpi5.0.2/lib/libhdf5_hl.so"
[HDF5_jll]
libhdf5_hl_path = ".../free/hdf5-parallel/1.14.3/nvhpc24.1/openmpi5.0.2/lib/libhdf5_hl.so"
libhdf5_path = ".../free/hdf5-parallel/1.14.3/nvhpc24.1/openmpi5.0.2/lib/libhdf5.so"
[MPIPreferences]
__clear__ = ["preloads_env_switch"]
_format = "1.0"
abi = "OpenMPI"
binary = "system"
cclibs = []
libmpi = "libmpi"
mpiexec = "mpiexec"
preloads = []
Can you try if the same happens when using Julia 1.10?
juliaup add 1.10
juliaup default 1.10
?
Perhaps related: Garbage collection thread safety issues on 1.11 · Issue #56871 · JuliaLang/julia · GitHub
Thanks. But when I tried on 1.10, another problems occurs.
┌ Warning: Circular dependency detected. Precompilation will be skipped for:
│ NNlibFFTWExt [96386cae-6b62-59ad-b532-a94eae05753e]
│ SparseArraysExt [85068d23-b5fb-53f1-8204-05c2aba6942f]
│ AtomixCUDAExt [13011619-4c7c-5ef0-948f-5fc81565cd05]
│ CUDAExt [11b7e2e0-d079-575b-885e-0ab22ef3252c]
│ cuDNN [02a925ec-e4fe-4b08-9a7e-0d78e3d38ccd]
│ LinearAlgebraExt [66d79d19-2cc4-5b0b-ac7a-b340256d1ecd]
│ NNlibCUDAExt [8a688d86-d2bc-5ad3-8ed1-384f9f2c8cc5]
│ NNlibCUDACUDNNExt [ab3ce674-22af-5de9-b6c7-795b17302dcb]
│ KernelAbstractions [63c18a36-062a-441e-b654-da1e3ab1ce7c]
│ ChainRulesCoreExt [eae2faf6-b232-58cb-a410-7764fda2830c]
│ NNlib [872c559c-99b0-510c-b3b7-b6c96a88d5cd]
│ CUDA [052768ef-5323-5732-b1bb-66c8b64840ba]
I thought I can run program without precompilation but segmentation fault occured.
┌ Warning: Module CUDA with build ID ffffffff-ffff-ffff-0007-d92daae8b4b5 is missing from the cache.
│ This may mean CUDA [052768ef-5323-5732-b1bb-66c8b64840ba] does not support precompilation but is imported by a module that does.
└ @ Base loading.jl:2011
┌ Error: Error during loading of extension AtomixCUDAExt of Atomix, use `Base.retry_load_extensions()` to retry.
│ exception =
│ 1-element ExceptionStack:
│ Declaring __precompile__(false) is not allowed in files that are being precompiled.
│ Stacktrace:
│ [1] _require(pkg::Base.PkgId, env::Nothing)
│ @ Base ./loading.jl:2062
│ [2] __require_prelocked(uuidkey::Base.PkgId, env::Nothing)
│ @ Base ./loading.jl:1875
│ [3] #invoke_in_world#3
│ @ ./essentials.jl:926 [inlined]
│ [4] invoke_in_world
│ @ ./essentials.jl:923 [inlined]
│ [5] _require_prelocked
│ @ ./loading.jl:1866 [inlined]
│ [6] _require_prelocked
│ @ ./loading.jl:1865 [inlined]
│ [7] run_extension_callbacks(extid::Base.ExtensionId)
│ @ Base ./loading.jl:1358
│ [8] run_extension_callbacks(pkgid::Base.PkgId)
│ @ Base ./loading.jl:1393
│ [9] run_package_callbacks(modkey::Base.PkgId)
│ @ Base ./loading.jl:1218
│ [10] _tryrequire_from_serialized(modkey::Base.PkgId, path::String, ocachepath::String, sourcepath::String, depmods::Vector{Any})
│ @ Base ./loading.jl:1550
│ [11] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt128)
│ @ Base ./loading.jl:1637
│ [12] _require(pkg::Base.PkgId, env::String)
│ @ Base ./loading.jl:2001
│ [13] __require_prelocked(uuidkey::Base.PkgId, env::String)
│ @ Base ./loading.jl:1875
│ [14] #invoke_in_world#3
│ @ ./essentials.jl:926 [inlined]
│ [15] invoke_in_world
│ @ ./essentials.jl:923 [inlined]
│ [16] _require_prelocked(uuidkey::Base.PkgId, env::String)
│ @ Base ./loading.jl:1866
│ [17] macro expansion
│ @ ./loading.jl:1853 [inlined]
│ [18] macro expansion
│ @ ./lock.jl:267 [inlined]
│ [19] __require(into::Module, mod::Symbol)
│ @ Base ./loading.jl:1816
│ [20] #invoke_in_world#3
│ @ ./essentials.jl:926 [inlined]
│ [21] invoke_in_world
│ @ ./essentials.jl:923 [inlined]
│ [22] require(into::Module, mod::Symbol)
│ @ Base ./loading.jl:1809
│ [23] include(mod::Module, _path::String)
│ @ Base ./Base.jl:495
│ [24] include(x::String)
│ @ CUDA ~/.julia/packages/CUDA/2kjXI/src/CUDA.jl:1
│ [25] top-level scope
│ @ ~/.julia/packages/CUDA/2kjXI/src/CUDA.jl:123
│ [26] include
│ @ ./Base.jl:495 [inlined]
│ [27] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::String)
│ @ Base ./loading.jl:2285
│ [28] top-level scope
│ @ stdin:3
│ [29] eval
│ @ ./boot.jl:385 [inlined]
│ [30] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String)
│ @ Base ./loading.jl:2139
│ [31] include_string
│ @ ./loading.jl:2149 [inlined]
│ [32] exec_options(opts::Base.JLOptions)
│ @ Base ./client.jl:321
│ [33] _start()
│ @ Base ./client.jl:557
└ @ Base loading.jl:1364
[4050854] signal (11.1): Segmentation fault
I tried remove Manifest.toml and Project.toml and regenerate but meanless.
What is the output of:
using Pkg
Pkg.status()
?
Can you share the file that causes the issue when running the MWE?
Thanks.
Current directory:
Status `.../Project.toml`
[052768ef] CUDA v5.5.2
[7a1cc6ca] FFTW v1.8.0
[f67ccb44] HDF5 v0.17.2
[a98d9a8b] Interpolations v0.15.1
[033835bb] JLD2 v0.5.10
[63c18a36] KernelAbstractions v0.9.31
My private project
[da04e1cc] MPI v0.20.22
[3da0fdf6] MPIPreferences v0.1.11
[bac558e1] OrderedCollections v1.7.0
[aea7be01] PrecompileTools v1.2.1
[64499a7a] WriteVTK v1.21.1
Project (developing)
[052768ef] CUDA v5.5.2
[7a1cc6ca] FFTW v1.8.0
[f67ccb44] HDF5 v0.17.2
[a98d9a8b] Interpolations v0.15.1
[63c18a36] KernelAbstractions v0.9.31
[da04e1cc] MPI v0.20.22
[3da0fdf6] MPIPreferences v0.1.11
[1914dd2f] MacroTools v0.5.13
[872c559c] NNlib v0.9.26
[aea7be01] PrecompileTools v1.2.1
[ddb6d928] YAML v0.4.12
[02a925ec] cuDNN v1.4.0
[37e2e46d] LinearAlgebra
[de0858da] Printf
[9a3f8284] Random
And Pkg.status taught me to run Pkg.update(). But after updating, Pkg.precompile() throwed
┌ Warning: Circular dependency detected. Precompilation will be skipped for:
│ NNlibFFTWExt [96386cae-6b62-59ad-b532-a94eae05753e]
│ SparseArraysExt [85068d23-b5fb-53f1-8204-05c2aba6942f]
│ My private project
│ AtomixCUDAExt [13011619-4c7c-5ef0-948f-5fc81565cd05]
│ CUDAExt [11b7e2e0-d079-575b-885e-0ab22ef3252c]
│ cuDNN [02a925ec-e4fe-4b08-9a7e-0d78e3d38ccd]
│ LinearAlgebraExt [66d79d19-2cc4-5b0b-ac7a-b340256d1ecd]
│ NNlibCUDAExt [8a688d86-d2bc-5ad3-8ed1-384f9f2c8cc5]
│ NNlibCUDACUDNNExt [ab3ce674-22af-5de9-b6c7-795b17302dcb]
│ KernelAbstractions [63c18a36-062a-441e-b654-da1e3ab1ce7c]
│ ChainRulesCoreExt [eae2faf6-b232-58cb-a410-7764fda2830c]
│ NNlib [872c559c-99b0-510c-b3b7-b6c96a88d5cd]
│ CUDA [052768ef-5323-5732-b1bb-66c8b64840ba]
└ @ Pkg.API ~/.julia/juliaup/julia-1.10.7+0.x64.linux.gnu/share/julia/stdlib/v1.10/Pkg/src/API.jl:1279
And the function including MWE.
function save_stat(stat_, config, topo)
if isroot(topo)
stat_file = get_stat_file(config)
h = if stat_.i === 0
hh = h5open(stat_file, "w")
write(hh, "config", serialize(config))
hh
else
h5open(stat_file, "cw")
end
key = "stats/$(stat_.i)"
if haskey(h, key)
return nothing
end
write(h, key, serialize(stat_))
close(h)
end
end
serialize
converts namedtuples to strings.
get_stat_file
returns stats file name.
stat_, config
is namedtuple.
Now I use only 1 process so topo isroot(topo) always returns true.
And I misunderstood the reason that memory errors occur.
I tried other conditions on 1.11.2.
- remove delete_object. → still the error remains.
- remove above
save_stat
→ any errors didnot occured.