Error/segfault in basic test of CUDA-aware MPI

Problem

A basic test of CUDA-aware MPI with MPI.jl fails on both our Cray supercomputer and on another cluster:

1) On Cray system

1.1) Test and error:

omlins@dom101:~> export MPICH_RDMA_ENABLED_CUDA=1


omlins@nid00002:~> julia
julia> using MPI

julia> using CuArrays


julia> MPI.Init()

julia> comm = MPI.COMM_WORLD
MPI.Comm(MPI.MPI_Comm(0x44000000))

julia> rank = MPI.Comm_rank(comm)
0

julia> size = MPI.Comm_size(comm)
1

julia> dst = mod(rank+1, size)
0

julia> src = mod(rank-1, size)
0

julia> N = 4
4

julia> send_mesg = CuArray{Float64}(undef, N)
4-element CuArray{Float64,1}:
 0.0
 0.0
 0.0
 0.0

julia> recv_mesg = CuArray{Float64}(undef, N)
4-element CuArray{Float64,1}:
 0.0
 0.0
 0.0
 0.0

julia> fill!(send_mesg, Float64(rank))
4-element CuArray{Float64,1}:
 0.0
 0.0
 0.0
 0.0

julia> rreq = MPI.Irecv!(recv_mesg, src,  src+32, comm)

signal (11): Segmentation fault
in expression starting at REPL[13]:1
unknown function (ip: 0xffffffffffffffff)
MPIR_gpu_pointer_type at /opt/cray/pe/mpt/7.7.10/gni/mpich-gnu/8.2/lib/libmpich.so (unknown line)
MPID_Irecv at /opt/cray/pe/mpt/7.7.10/gni/mpich-gnu/8.2/lib/libmpich.so (unknown line)
MPI_Irecv at /opt/cray/pe/mpt/7.7.10/gni/mpich-gnu/8.2/lib/libmpich.so (unknown line)
Irecv! at /users/omlins/.julia/1.2.0/dom-gpu/packages/MPI/9zBr2/src/pointtopoint.jl:299
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2197
Irecv! at /users/omlins/.julia/1.2.0/dom-gpu/packages/MPI/9zBr2/src/pointtopoint.jl:330
unknown function (ip: 0x2aaad193e4b6)
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2197
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:323
eval_value at /buildworker/worker/package_linux64/build/src/interpreter.c:411
eval_stmt_value at /buildworker/worker/package_linux64/build/src/interpreter.c:362 [inlined]
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:772
jl_interpret_toplevel_thunk_callback at /buildworker/worker/package_linux64/build/src/interpreter.c:884
unknown function (ip: 0xfffffffffffffffe)
unknown function (ip: 0x2aaabbef1b0f)
unknown function (ip: 0x1)
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:893
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:815
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:764
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/toplevel.c:844
eval at ./boot.jl:330
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2191
eval_user_input at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/REPL/src/REPL.jl:86
macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.2/REPL/src/REPL.jl:118 [inlined]
#26 at ./task.jl:268
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2191
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1614 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:596
unknown function (ip: 0xffffffffffffffff)
Allocations: 41692306 (Pool: 41686043; Big: 6263); GC: 88
Segmentation fault (core dumped)

NOTE: when running with 2 processes the error is the same.

1.2) Installation

MPI: cray-mpich/7.7.10
CUDA: Cuda 10.1
OS: SUSE Linux Enterprise Server 15
Packages (stacked environment):
(1.2.0-dom-gpu) pkg> status
Status ~/.julia/1.2.0/dom-gpu/environments/1.2.0-dom-gpu/Project.toml
[da04e1cc] MPI v0.10.1

julia> LOAD_PATH
4-element Array{String,1}:
“@”
“@#.#.#-dom-gpu
“/apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/environments/1.2.0-dom-gpu”
@stdlib

(1.2.0-dom-gpu) pkg> activate /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/environments/1.2.0-dom-gpu
Activating environment at /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/environments/1.2.0-dom-gpu/Project.toml

(1.2.0-dom-gpu) pkg> status
Status /apps/dom/UES/jenkins/7.0.UP01/gpu/easybuild/software/Julia/1.2.0-CrayGNU-19.10-cuda-10.1/extensions/environments/1.2.0-dom-gpu/Project.toml
[c5f51814] CUDAdrv v3.1.0
[be33ccc6] CUDAnative v2.4.0
[3a865a2d] CuArrays v1.3.0
[da04e1cc] MPI v0.9.0

2) On other cluster

2.1) Test and error:

[somlin@node32 ~]$ julia
julia> using MPI

julia> using CUDAdrv

julia> using CUDAnative

julia> using CuArrays

julia> MPI.Init()

julia> comm = MPI.COMM_WORLD
MPI.Comm(MPI.MPI_Comm(0x00007f2e25b927e0))

julia> rank = MPI.Comm_rank(comm)
0

julia> size = MPI.Comm_size(comm)
1

julia> dst = mod(rank+1, size)
0

julia> src = mod(rank-1, size)
0

julia> N = 4
4

julia> send_mesg = CuArray{Float64}(undef, N)

4-element CuArray{Float64,1,Nothing}:
 672.5990755462913
 672.5990755462913
 672.5990755462913
 672.5990755462913

julia> recv_mesg = CuArray{Float64}(undef, N)
4-element CuArray{Float64,1,Nothing}:
 672.5990755462913
 672.5990755462913
 672.5990755462913
 672.5990755462913

julia> fill!(send_mesg, Float64(rank))
[ Info: Building the CUDAnative run-time library for your sm_52 device, this might take a while...
4-element CuArray{Float64,1,Nothing}:
 0.0
 0.0
 0.0
 0.0

julia> rreq = MPI.Irecv!(recv_mesg, src,  src+32, comm)
ERROR: MethodError: no method matching unsafe_convert(::Type{MPI.MPIPtr}, ::CuArray{Float64,1,Nothing})
Closest candidates are:
  unsafe_convert(::Type{MPI.MPIPtr}, !Matched::MPI.SentinelPtr) at /home/somlin/.julia/packages/MPI/9zBr2/src/MPI.jl:31
  unsafe_convert(::Type{MPI.MPIPtr}, !Matched::Union{Ptr{T}, Ref{T}, SubArray{T,N,P,I,L} where L where I where P where N, Array{T,N} where N}) where T at /home/somlin/.julia/packages/MPI/9zBr2/src/datatypes.jl:24
  unsafe_convert(::Type{MPI.MPIPtr}, !Matched::CUDAdrv.Mem.DeviceBuffer) at /home/somlin/.julia/packages/MPI/9zBr2/src/cuda.jl:10
  ...
Stacktrace:
 [1] Irecv!(::CuArray{Float64,1,Nothing}, ::Int64, ::MPI.MPI_Datatype, ::Int64, ::Int64, ::MPI.Comm) at /home/somlin/.julia/packages/MPI/9zBr2/src/pointtopoint.jl:299
 [2] Irecv!(::CuArray{Float64,1,Nothing}, ::Int64, ::Int64, ::MPI.Comm) at /home/somlin/.julia/packages/MPI/9zBr2/src/pointtopoint.jl:330
 [3] top-level scope at REPL[15]:1

2.2) Installation

MPI: Open MPI: 2.1.5 (MPI API: 3.1.0)
CUDA: Cuda 10.0
OS: CentOS release 6.9
Packages:
(v1.2) pkg> status
Status ~/.julia/environments/v1.2/Project.toml
[c5f51814] CUDAdrv v4.0.4
[be33ccc6] CUDAnative v2.5.5
[3a865a2d] CuArrays v1.4.7
[da04e1cc] MPI v0.10.1

Question

How can we get CUDA-aware MPI to work on these systems?
Note that CUDA-aware MPI works fine on both systems with CUDA C applications.

Thanks!!

1 Like

The first error is odd since it detected that we are passing it a GPU pointer, but then subsequently segfaults. For the second case it seems that the CUDA support wasn’t loaded
https://github.com/JuliaParallel/MPI.jl/blob/85accff77c2be82b90eeea645c82d58c2a7186f5/src/MPI.jl#L73

Can you try what Base.cconvert(MPI.MPIPtr, recv_mesg) yields?

In general I think we have been mostly testing on OpenMPI.

This was due to a change in CuArrays: you can either downgrade CuArrays.jl to 1.3, or use master MPI.jl (I’ll tag a new version ASAP). Unfortunately optional dependencies don’t affect version resolution.

1 Like

Thanks @simonbyrne and @vchuravy! Downgrading the CUDA packages to

(v1.2) pkg> status
    Status `~/.julia/environments/v1.2/Project.toml`
  [c5f51814] CUDAdrv v3.1.0
  [be33ccc6] CUDAnative v2.4.0
  [3a865a2d] CuArrays v1.3.0
  [da04e1cc] MPI v0.10.1

for the Open-MPI case (case 2 above) made the example succeed without errors. :slight_smile:

Do you have any idea how to debug the issue on the Cray System (with cray-mpich/7.7.10; case 1 above)?

Unfortunately no: I have access to another Cray machine, but unfortunately was never able to get MPI.jl to work at all on it (in that case it segfaulted when dlopen-ing the MPI library).

The only thing I can think of is to check that Julia and MPICH are using the same CUDA version?

Thanks @simonbyrne. I am back to the issue of getting CUDA-aware MPI to work with cray-mpich. Thanks again for the OpenMPI solution at last minute before the AGU conference!

The only thing I can think of is to check that Julia and MPICH are using the same CUDA version?

How would you suggest to check that?

Can you call ldd on the MPI library?

In the MPI.jl build.log it says:

[ Info: Using MPI library /opt/cray/pe/mpt/7.7.10/gni/mpich-gnu/8.2/lib/libmpich.so

This is in agreement with the information on the loaded cray-mpich module:

(...)
setenv		 CRAY_MPICH_DIR /opt/cray/pe/mpt/7.7.10/gni/mpich-gnu/8.2
(...)

Then doing ldd on it, I get:

ldd /opt/cray/pe/mpt/7.7.10/gni/mpich-gnu/8.2/lib/libmpich.so
	linux-vdso.so.1 (0x00007fff0e153000)
	libxpmem.so.0 => /opt/cray/xpmem/2.2.19-7.0.1.1_3.7__gdcf436c.ari/lib64/libxpmem.so.0 (0x00002b3d7a5c2000)
	librt.so.1 => /lib64/librt.so.1 (0x00002b3d7a7c5000)
	libugni.so.0 => /opt/cray/ugni/6.0.14.0-7.0.1.1_7.10__ge78e5b0.ari/lib64/libugni.so.0 (0x00002b3d7a9cd000)
	libudreg.so.0 => /opt/cray/udreg/2.3.2-7.0.1.1_3.9__g8175d3d.ari/lib64/libudreg.so.0 (0x00002b3d7ac51000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b3d7ae5b000)
	libpmi.so.0 => /opt/cray/pe/pmi/5.0.14/lib64/libpmi.so.0 (0x00002b3d7b079000)
	libgfortran.so.5 => /opt/gcc/8.3.0/snos/lib64/libgfortran.so.5 (0x00002b3d7b2c2000)
	libm.so.6 => /lib64/libm.so.6 (0x00002b3d7b731000)
	libgcc_s.so.1 => /opt/gcc/8.3.0/snos/lib64/libgcc_s.so.1 (0x00002b3d7ba69000)
	libquadmath.so.0 => /opt/gcc/8.3.0/snos/lib64/libquadmath.so.0 (0x00002b3d7bc81000)
	libc.so.6 => /lib64/libc.so.6 (0x00002b3d7bec1000)
	/lib64/ld-linux-x86-64.so.2 (0x00002b3d79dd8000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00002b3d7c27b000)
	libz.so.1 => /lib64/libz.so.1 (0x00002b3d7c47f000)

There is no CUDA library linked. However, it could be that the CUDA libraries are loaded with dlopen at runtime.

What do you think?

Unfortunately I have no idea: OpenMPI+UCX is the only CUDA-aware MPI I’ve had much success with (and even then it had been a pain). I know MVAPICH requires some special environment flags: are there any required for Cray MPICH?

I know MVAPICH requires some special environment flags: are there any required for Cray MPICH?

Normally it should be enough to do:

export MPICH_RDMA_ENABLED_CUDA=1

as I did in the example in the topic description. I tried with setting

export CRAY_CUDA_MPS=1

in addition as on some page I found some hint that the cuda-aware MPI library might spawn a separate process on the GPU (I am not sure at all about that though). The error was still the same.

I will see if the guys from Cray can give any help on this and report back. Meanwhile, if you have any idea of what else to check, please let me know.

I just ran over this old topic and will quickly update it: meanwhile, CUDA-aware MPI works on the Cray system by just setting:

export MPICH_RDMA_ENABLED_CUDA=1
export JULIA_CUDA_USE_BINARYBUILDER=false

I guess the problem was simply that CUDA.jl did not use the system installation in the past as it does so only if JULIA_CUDA_USE_BINARYBUILDER=false is not only set at build time but also at runtime (at least with CUDA.jl v1).

1 Like