Error/segfault in basic test of CUDA-aware MPI

Problem

A basic test of CUDA-aware MPI with MPI.jl fails on both our Cray supercomputer and on another cluster:

1) On Cray system

1.1) Test and error:

omlins@dom101:~> export MPICH_RDMA_ENABLED_CUDA=1


omlins@nid00002:~> julia
julia> using MPI

julia> using CuArrays


julia> MPI.Init()

julia> comm = MPI.COMM_WORLD
MPI.Comm(MPI.MPI_Comm(0x44000000))

julia> rank = MPI.Comm_rank(comm)
0

julia> size = MPI.Comm_size(comm)
1

julia> dst = mod(rank+1, size)
0

julia> src = mod(rank-1, size)
0

julia> N = 4
4

julia> send_mesg = CuArray{Float64}(undef, N)
4-element CuArray{Float64,1}:
 0.0
 0.0
 0.0
 0.0

julia> recv_mesg = CuArray{Float64}(undef, N)
4-element CuArray{Float64,1}:
 0.0
 0.0
 0.0
 0.0

julia> fill!(send_mesg, Float64(rank))
4-element CuArray{Float64,1}:
 0.0
 0.0
 0.0
 0.0

julia> rreq = MPI.Irecv!(recv_mesg, src,  src+32, comm)

signal (11): Segmentation fault
in expression starting at REPL[13]:1
unknown function (ip: 0xffffffffffffffff)
MPIR_gpu_pointer_type at /opt/cray/pe/mpt/7.7.10/gni/mpich-gnu/8.2/lib/libmpich.so (unknown line)
MPID_Irecv at /opt/cray/pe/mpt/7.7.10/gni/mpich-gnu/8.2/lib/libmpich.so (unknown line)
MPI_Irecv at /opt/cray/pe/mpt/7.7.10/gni/mpich-gnu/8.2/lib/libmpich.so (unknown line)
Irecv! at /users/omlins/.julia/1.2.0/dom-gpu/packages/MPI/9zBr2/src/pointtopoint.jl:299
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2197
Irecv! at /users/omlins/.julia/1.2.0/dom-gpu/packages/MPI/9zBr2/src/pointtopoint.jl:330
unknown function (ip: 0x2aaad193e4b6)
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2197
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:323
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:411
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:362 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:772
jl_interpret_toplevel_thunk_callback at /buildworker/worker/package_linux64/build/src/interpreter.c:884
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x2aaabbef1b0f)
unknown function (ip: 0x1)
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:893
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:815
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:764
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:844
eval at ./boot.jl:330
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2191
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/REPL/src/REPL.jl:86
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/REPL/src/REPL.jl:118 [inlined]
#26 at ./task.jl:268
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2191
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1614 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:596
unknown function (ip: 0xffffffffffffffff)
Allocations: 41692306 (Pool: 41686043; Big: 6263); GC: 88
Segmentation fault (core dumped)

NOTE: when running with 2 processes the error is the same.

1.2) Installation

MPI: cray-mpich/7.7.10
CUDA: Cuda 10.1
OS: SUSE Linux Enterprise Server 15
Packages (stacked environment):
(1.2.0-dom-gpu) pkg> status
Status ~/.julia/1.2.0/dom-gpu/environments/1.2.0-dom-gpu/Project.toml
[da04e1cc] MPI v0.10.1

julia> LOAD_PATH
4-element Array{String,1}:
“@”
“@#.#.#-dom-gpu
“/apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/environments/1.2.0-dom-gpu”
@stdlib

(1.2.0-dom-gpu) pkg> activate /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/environments/1.2.0-dom-gpu
Activating environment at /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/environments/1.2.0-dom-gpu/Project.toml

(1.2.0-dom-gpu) pkg> status
Status /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/environments/1.2.0-dom-gpu/Project.toml
[c5f51814] CUDAdrv v3.1.0
[be33ccc6] CUDAnative v2.4.0
[3a865a2d] CuArrays v1.3.0
[da04e1cc] MPI v0.9.0

2) On other cluster

2.1) Test and error:

[somlin@node32 ~]$ julia
julia> using MPI

julia> using CUDAdrv

julia> using CUDAnative

julia> using CuArrays

julia> MPI.Init()

julia> comm = MPI.COMM_WORLD
MPI.Comm(MPI.MPI_Comm(0x00007f2e25b927e0))

julia> rank = MPI.Comm_rank(comm)
0

julia> size = MPI.Comm_size(comm)
1

julia> dst = mod(rank+1, size)
0

julia> src = mod(rank-1, size)
0

julia> N = 4
4

julia> send_mesg = CuArray{Float64}(undef, N)

4-element CuArray{Float64,1,Nothing}:
 672.5990755462913
 672.5990755462913
 672.5990755462913
 672.5990755462913

julia> recv_mesg = CuArray{Float64}(undef, N)
4-element CuArray{Float64,1,Nothing}:
 672.5990755462913
 672.5990755462913
 672.5990755462913
 672.5990755462913

julia> fill!(send_mesg, Float64(rank))
[ Info: Building the CUDAnative run-time library for your sm_52 device, this might take a while...
4-element CuArray{Float64,1,Nothing}:
 0.0
 0.0
 0.0
 0.0

julia> rreq = MPI.Irecv!(recv_mesg, src,  src+32, comm)
ERROR: MethodError: no method matching unsafe_convert(::Type{MPI.MPIPtr}, ::CuArray{Float64,1,Nothing})
Closest candidates are:
  unsafe_convert(::Type{MPI.MPIPtr}, !Matched::MPI.SentinelPtr) at /home/somlin/.julia/packages/MPI/9zBr2/src/MPI.jl:31
  unsafe_convert(::Type{MPI.MPIPtr}, !Matched::Union{Ptr{T}, Ref{T}, SubArray{T,N,P,I,L} where L where I where P where N, Array{T,N} where N}) where T at /home/somlin/.julia/packages/MPI/9zBr2/src/datatypes.jl:24
  unsafe_convert(::Type{MPI.MPIPtr}, !Matched::CUDAdrv.Mem.DeviceBuffer) at /home/somlin/.julia/packages/MPI/9zBr2/src/cuda.jl:10
  ...
Stacktrace:
 [1] Irecv!(::CuArray{Float64,1,Nothing}, ::Int64, ::MPI.MPI_Datatype, ::Int64, ::Int64, ::MPI.Comm) at /home/somlin/.julia/packages/MPI/9zBr2/src/pointtopoint.jl:299
 [2] Irecv!(::CuArray{Float64,1,Nothing}, ::Int64, ::Int64, ::MPI.Comm) at /home/somlin/.julia/packages/MPI/9zBr2/src/pointtopoint.jl:330
 [3] top-level scope at REPL[15]:1

2.2) Installation

MPI: Open MPI: 2.1.5 (MPI API: 3.1.0)
CUDA: Cuda 10.0
OS: CentOS release 6.9
Packages:
(v1.2) pkg> status
Status ~/.julia/environments/v1.2/Project.toml
[c5f51814] CUDAdrv v4.0.4
[be33ccc6] CUDAnative v2.5.5
[3a865a2d] CuArrays v1.4.7
[da04e1cc] MPI v0.10.1

Question

How can we get CUDA-aware MPI to work on these systems?
Note that CUDA-aware MPI works fine on both systems with CUDA C applications.

Thanks!!

1 Like

The first error is odd since it detected that we are passing it a GPU pointer, but then subsequently segfaults. For the second case it seems that the CUDA support wasn’t loaded

Can you try what Base.cconvert(MPI.MPIPtr, recv_mesg) yields?

In general I think we have been mostly testing on OpenMPI.

This was due to a change in CuArrays: you can either downgrade CuArrays.jl to 1.3, or use master MPI.jl (I’ll tag a new version ASAP). Unfortunately optional dependencies don’t affect version resolution.

1 Like

Thanks @simonbyrne and @vchuravy! Downgrading the CUDA packages to

(v1.2) pkg> status
    Status `~/.julia/environments/v1.2/Project.toml`
  [c5f51814] CUDAdrv v3.1.0
  [be33ccc6] CUDAnative v2.4.0
  [3a865a2d] CuArrays v1.3.0
  [da04e1cc] MPI v0.10.1

for the Open-MPI case (case 2 above) made the example succeed without errors. :slight_smile:

Do you have any idea how to debug the issue on the Cray System (with cray-mpich/7.7.10; case 1 above)?

Unfortunately no: I have access to another Cray machine, but unfortunately was never able to get MPI.jl to work at all on it (in that case it segfaulted when dlopen-ing the MPI library).

The only thing I can think of is to check that Julia and MPICH are using the same CUDA version?