A minor addition to @samo 's hints, you could try setting the CUDA memory pool to none
:
export JULIA_CUDA_MEMORY_POOL=none
This may help for the CUDA-aware MPI error.
For the artifact download issue, I’d make sure, starting from scratch once more, to:
- Have MPI and CUDA on path (or module loaded) that were used to build the CUDA-aware MPI
- Make sure to have:
export JULIA_CUDA_MEMORY_POOL=none export JULIA_MPI_BINARY=system export JULIA_CUDA_USE_BINARYBUILDER=false
- Add CUDA and MPI packages in Julia. Build MPI.jl in verbose mode to check whether correct versions are built/used:
julia -e 'using Pkg; pkg"add CUDA"; pkg"add MPI"; Pkg.build("MPI"; verbose=true)'
- Then in Julia, upon loading MPI and CUDA modules, you can check
- CUDA version:
CUDA.versioninfo()
- If MPI has CUDA:
MPI.has_cuda()
- If you are using correct MPI implementation:
MPI.identify_implementation()
- CUDA version:
After that, running the simple test script @samo suggested here, launching it from a shell script as in here should make it.