Thanks @simonbyrne and @vchuravy! Downgrading the CUDA packages to
(v1.2) pkg> status
Status `~/.julia/environments/v1.2/Project.toml`
[c5f51814] CUDAdrv v3.1.0
[be33ccc6] CUDAnative v2.4.0
[3a865a2d] CuArrays v1.3.0
[da04e1cc] MPI v0.10.1
for the Open-MPI case (case 2 above) made the example succeed without errors.
Do you have any idea how to debug the issue on the Cray System (with cray-mpich/7.7.10; case 1 above)?