CUDA aware MPI works on system but not for Julia

A minor addition to @samo 's hints, you could try setting the CUDA memory pool to none:

export JULIA_CUDA_MEMORY_POOL=none

This may help for the CUDA-aware MPI error.

For the artifact download issue, I’d make sure, starting from scratch once more, to:

  • Have MPI and CUDA on path (or module loaded) that were used to build the CUDA-aware MPI
  • Make sure to have:
    export JULIA_CUDA_MEMORY_POOL=none
    export JULIA_MPI_BINARY=system
    export JULIA_CUDA_USE_BINARYBUILDER=false
    
  • Add CUDA and MPI packages in Julia. Build MPI.jl in verbose mode to check whether correct versions are built/used:
    julia -e 'using Pkg; pkg"add CUDA"; pkg"add MPI"; Pkg.build("MPI"; verbose=true)'
    
  • Then in Julia, upon loading MPI and CUDA modules, you can check
    • CUDA version: CUDA.versioninfo()
    • If MPI has CUDA: MPI.has_cuda()
    • If you are using correct MPI implementation: MPI.identify_implementation()

After that, running the simple test script @samo suggested here, launching it from a shell script as in here should make it.

4 Likes