Disabling the memory pool did the trick, thanks!. All other suggestions I already tested thanks to the great help of @samo. I do not know which conclusion to draw from this, though. Does that mean that the memory pool and CUDA-aware MPI are incompatible together? Or is this something that needs fixing in our MPI installation, or in MPI.jl
or CUDA.jl
?
Glad this did the trick!
Does that mean that the memory pool and CUDA-aware MPI are incompatible together?
Yes, it seems that with CUDA-aware MPI, memory management by CUDA.jl âconflictsâ with what CUDA-aware MPI expects (see https://juliagpu.gitlab.io/CUDA.jl/usage/memory/#Memory-pool and https://juliagpu.gitlab.io/CUDA.jl/usage/memory/#Environment-variables). Setting the memory pool to none
âdirectly defers to the CUDA allocatorâ.
Nothing CUDA.jl does here is fancy; rather CUDAâs own APIs are incompatible with itself: Legacy cuIpc* APIs incompatible with stream-ordered allocator ¡ Issue #1053 ¡ JuliaGPU/CUDA.jl ¡ GitHub.
CUDA.jl upgrades your driver library by using a forward-compatible libcuda.so
.
Why? Whatâs inherently incompatible between our CUDA artifacts and MPI? The issue here seems the IPC/memory pool incompatibility, not the actual CUDA binaries.
Alright, that is very useful information. I can work without the memory pool for now, and hopefully this gets resolved in the future. If it helps to provide failing test programsâs anywhere let me know, but I conclude from your answer that the problem is already very clear to the developers.
I think there is no reason to believe that they would be inherently incompatible. It certainly is only a matter of the installation.
However, as far as I am aware, it is currently not possible to have CUDA-aware MPI.jl working without using a system installed CUDA-aware MPI, and in a CUDA-aware installation, we need to specify which CUDA installation to use. So, it seems natural to use also this very same system installed CUDA for CUDA.jl in order to be sure that all works smoothly. Now, it I could imagine that it is possible to either 1) first install CUDA.jl with artifacts and install CUDA-aware system MPI with it or 2) to just use the CUDA libraries from the CUDA.jl artifacts at runtime. I think 1) could be of interest in order to avoid having to install CUDA manually; I am not sure how much benefits or problems could bring 2). Do you have any coments on 1) and 2)?
In any case, I think most interesting would be, if at some point, MPI.jl could support CUDA-aware MPI without having to rely on a system-installed MPI. This would be nice for small clusters or multi-GPU workstations - at least for a quick-start or a fallback (on supercomputers, a system-optimized MPI - as Cray-MPICH in our case - will certainly always be preferred). @simonbyrne, could you maybe comment on the feasibility of this and if this is anywhere on the sky?
Ah ok, so the MPI back-end JLLs selected by MPI.jl (assuming JLLs are used, and itâs not just the system version again) need to build against the same CUDA version used by CUDA.jl. Thatâs a work in progress, and not possible yet (it needs a CUDA_jll.jl that can be used by both CUDA.jl and those MPI back-ends).
The other advantage is that in general CUDA.jl does a better job selecting a CUDA toolkit thatâs supported by your system, both in terms of compatibility and in selecting the most up-to-date version, which matters since thereâs known compilation bugs with all but the most recent CUDA compiler.
Possibly? @vchuravy did get UCX_jll to build on a small number of platforms with CUDA support, but itâs not clear if this is feasible more generally.
Thanks @maleadt and @simonbyrne. It is good to have from time to time a review of the situation
Does this work have any connection to https://github.com/JuliaPackaging/Yggdrasil/issues/2063? Asking because AFAIK MPI.jl provides the only GPU-compatible broadcast + allreduce interface in Julia land, but deep learning framework users are unlikely to have a compatible system MPI installation.
No it hasnât. I made a Julia port https://github.com/Chiil/MicroHH.jl of our C++/CUDA atmospheric simulator https://github.com/microhh/microhh and was trying how hard it is to get CUDA-aware MPI running.
Yes we should be able to handle that generally, once some of the artifact work has propagated a bit.