Hi, I am trying to implement a program with hybrid MPI + Threads parallelization.
In the following example, if I comment out all MPI-related lines, the code runs without any problem. However, if I run the code with the MPI lines, Julia crashes in a probabilistic manner (crash probability around 50%).
# mwe.jl
using MPI
using Base.Threads
MPI.Init_thread(MPI.THREAD_FUNNELED)
world_comm = MPI.COMM_WORLD
struct MyStruct
v::Vector{Vector{Float64}}
end
mystruct = MyStruct([[1.0] for i=1:nthreads()])
println("With @threads")
Threads.@threads :static for i in 1:nthreads()
println(mystruct.v[threadid()])
end
MPI.Finalize()
Run julia -t 2 mwe.jl
, output:
With @threads
signal (11): Segmentation fault
in expression starting at /home/jmlim/julia_epw/EPW.jl/running/mwe.jl:16
jl_mutex_wait at /buildworker/worker/package_linux64/build/src/locks.h:37 [inlined]
jl_mutex_lock at /buildworker/worker/package_linux64/build/src/locks.h:94
jl_generate_fptr at /buildworker/worker/package_linux64/build/src/jitlayers.cpp:272
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:1964
jl_compile_method_internal at /buildworker/worker/package_linux64/build/src/gf.c:1919 [inlined]
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2224 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
println at ./coreio.jl:4
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2231 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
macro expansion at /home/jmlim/julia_epw/EPW.jl/running/mwe.jl:17 [inlined]
#3#threadsfor_fun at ./threadingconstructs.jl:81
#3#threadsfor_fun at ./threadingconstructs.jl:48
unknown function (ip: 0x2ab50928062c)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2231 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2398
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1690 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:705
unknown function (ip: (nil))
Allocations: 780384 (Pool: 780097; Big: 287); GC: 1
Segmentation fault (core dumped)
julia -t 1 mwe.jl
output is okay:
With @threads
[1.0]
Another observation is that if I use v = [[1.0] for i=1:nthreads()]
as an independent variable, not as a field of a struct mystruct.v
, Julia does not crash.
What is the reason for this crash?
I am using Intel MPI Version 2019 on Linux (CentOS 7). I downloaded Julia 1.5.3
binary. I tested two MPI.jl verisions: v0.14.3
, v0.16.1
(most recent) and both crashes.
julia> MPI.identify_implementation()
(MPI.IntelMPI, v"2019.0.0")
(In the real program, the v::Vector{Vector{Float64}}
field is used as a pre-allocated buffer, one for each threads.)