I know MVAPICH requires some special environment flags: are there any required for Cray MPICH?
Normally it should be enough to do:
export MPICH_RDMA_ENABLED_CUDA=1
as I did in the example in the topic description. I tried with setting
export CRAY_CUDA_MPS=1
in addition as on some page I found some hint that the cuda-aware MPI library might spawn a separate process on the GPU (I am not sure at all about that though). The error was still the same.
I will see if the guys from Cray can give any help on this and report back. Meanwhile, if you have any idea of what else to check, please let me know.