I just ran over this old topic and will quickly update it: meanwhile, CUDA-aware MPI works on the Cray system by just setting:
export MPICH_RDMA_ENABLED_CUDA=1
export JULIA_CUDA_USE_BINARYBUILDER=false
I guess the problem was simply that CUDA.jl did not use the system installation in the past as it does so only if JULIA_CUDA_USE_BINARYBUILDER=false
is not only set at build time but also at runtime (at least with CUDA.jl v1).