Problem
A basic test of CUDA-aware MPI with MPI.jl fails on both our Cray supercomputer and on another cluster:
1) On Cray system
1.1) Test and error:
omlins@dom101:~> export MPICH_RDMA_ENABLED_CUDA=1
omlins@nid00002:~> julia
julia> using MPI
julia> using CuArrays
julia> MPI.Init()
julia> comm = MPI.COMM_WORLD
MPI.Comm(MPI.MPI_Comm(0x44000000))
julia> rank = MPI.Comm_rank(comm)
0
julia> size = MPI.Comm_size(comm)
1
julia> dst = mod(rank+1, size)
0
julia> src = mod(rank-1, size)
0
julia> N = 4
4
julia> send_mesg = CuArray{Float64}(undef, N)
4-element CuArray{Float64,1}:
0.0
0.0
0.0
0.0
julia> recv_mesg = CuArray{Float64}(undef, N)
4-element CuArray{Float64,1}:
0.0
0.0
0.0
0.0
julia> fill!(send_mesg, Float64(rank))
4-element CuArray{Float64,1}:
0.0
0.0
0.0
0.0
julia> rreq = MPI.Irecv!(recv_mesg, src, src+32, comm)
signal (11): Segmentation fault
in expression starting at REPL[13]:1
unknown function (ip: 0xffffffffffffffff)
MPIR_gpu_pointer_type at /opt/cray/pe/mpt/7.7.10/gni/mpich-gnu/8.2/lib/libmpich.so (unknown line)
MPID_Irecv at /opt/cray/pe/mpt/7.7.10/gni/mpich-gnu/8.2/lib/libmpich.so (unknown line)
MPI_Irecv at /opt/cray/pe/mpt/7.7.10/gni/mpich-gnu/8.2/lib/libmpich.so (unknown line)
Irecv! at /users/omlins/.julia/1.2.0/dom-gpu/packages/MPI/9zBr2/src/pointtopoint.jl:299
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2197
Irecv! at /users/omlins/.julia/1.2.0/dom-gpu/packages/MPI/9zBr2/src/pointtopoint.jl:330
unknown function (ip: 0x2aaad193e4b6)
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2197
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:323
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:411
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:362 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:772
jl_interpret_toplevel_thunk_callback at /buildworker/worker/package_linux64/build/src/interpreter.c:884
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x2aaabbef1b0f)
unknown function (ip: 0x1)
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:893
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:815
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:764
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:844
eval at ./boot.jl:330
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2191
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/REPL/src/REPL.jl:86
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/REPL/src/REPL.jl:118 [inlined]
#26 at ./task.jl:268
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2191
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1614 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:596
unknown function (ip: 0xffffffffffffffff)
Allocations: 41692306 (Pool: 41686043; Big: 6263); GC: 88
Segmentation fault (core dumped)
NOTE: when running with 2 processes the error is the same.
1.2) Installation
MPI: cray-mpich/7.7.10
CUDA: Cuda 10.1
OS: SUSE Linux Enterprise Server 15
Packages (stacked environment):
(1.2.0-dom-gpu) pkg> status
Status ~/.julia/1.2.0/dom-gpu/environments/1.2.0-dom-gpu/Project.toml
[da04e1cc] MPI v0.10.1
julia> LOAD_PATH
4-element Array{String,1}:
“@”
“@#.#.#-dom-gpu”
“/apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/environments/1.2.0-dom-gpu”
“@stdlib”
(1.2.0-dom-gpu) pkg> activate /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/environments/1.2.0-dom-gpu
Activating environment at /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/environments/1.2.0-dom-gpu/Project.toml
(1.2.0-dom-gpu) pkg> status
Status /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/environments/1.2.0-dom-gpu/Project.toml
[c5f51814] CUDAdrv v3.1.0
[be33ccc6] CUDAnative v2.4.0
[3a865a2d] CuArrays v1.3.0
[da04e1cc] MPI v0.9.0
2) On other cluster
2.1) Test and error:
[somlin@node32 ~]$ julia
julia> using MPI
julia> using CUDAdrv
julia> using CUDAnative
julia> using CuArrays
julia> MPI.Init()
julia> comm = MPI.COMM_WORLD
MPI.Comm(MPI.MPI_Comm(0x00007f2e25b927e0))
julia> rank = MPI.Comm_rank(comm)
0
julia> size = MPI.Comm_size(comm)
1
julia> dst = mod(rank+1, size)
0
julia> src = mod(rank-1, size)
0
julia> N = 4
4
julia> send_mesg = CuArray{Float64}(undef, N)
4-element CuArray{Float64,1,Nothing}:
672.5990755462913
672.5990755462913
672.5990755462913
672.5990755462913
julia> recv_mesg = CuArray{Float64}(undef, N)
4-element CuArray{Float64,1,Nothing}:
672.5990755462913
672.5990755462913
672.5990755462913
672.5990755462913
julia> fill!(send_mesg, Float64(rank))
[ Info: Building the CUDAnative run-time library for your sm_52 device, this might take a while...
4-element CuArray{Float64,1,Nothing}:
0.0
0.0
0.0
0.0
julia> rreq = MPI.Irecv!(recv_mesg, src, src+32, comm)
ERROR: MethodError: no method matching unsafe_convert(::Type{MPI.MPIPtr}, ::CuArray{Float64,1,Nothing})
Closest candidates are:
unsafe_convert(::Type{MPI.MPIPtr}, !Matched::MPI.SentinelPtr) at /home/somlin/.julia/packages/MPI/9zBr2/src/MPI.jl:31
unsafe_convert(::Type{MPI.MPIPtr}, !Matched::Union{Ptr{T}, Ref{T}, SubArray{T,N,P,I,L} where L where I where P where N, Array{T,N} where N}) where T at /home/somlin/.julia/packages/MPI/9zBr2/src/datatypes.jl:24
unsafe_convert(::Type{MPI.MPIPtr}, !Matched::CUDAdrv.Mem.DeviceBuffer) at /home/somlin/.julia/packages/MPI/9zBr2/src/cuda.jl:10
...
Stacktrace:
[1] Irecv!(::CuArray{Float64,1,Nothing}, ::Int64, ::MPI.MPI_Datatype, ::Int64, ::Int64, ::MPI.Comm) at /home/somlin/.julia/packages/MPI/9zBr2/src/pointtopoint.jl:299
[2] Irecv!(::CuArray{Float64,1,Nothing}, ::Int64, ::Int64, ::MPI.Comm) at /home/somlin/.julia/packages/MPI/9zBr2/src/pointtopoint.jl:330
[3] top-level scope at REPL[15]:1
2.2) Installation
MPI: Open MPI: 2.1.5 (MPI API: 3.1.0)
CUDA: Cuda 10.0
OS: CentOS release 6.9
Packages:
(v1.2) pkg> status
Status ~/.julia/environments/v1.2/Project.toml
[c5f51814] CUDAdrv v4.0.4
[be33ccc6] CUDAnative v2.5.5
[3a865a2d] CuArrays v1.4.7
[da04e1cc] MPI v0.10.1
Question
How can we get CUDA-aware MPI to work on these systems?
Note that CUDA-aware MPI works fine on both systems with CUDA C applications.
Thanks!!