Thanks to @lyonsquark, v0.17 of MPI.jl (which I just tagged) should now support MPI profilers which use LD_PRELOAD
hooks. I believe he has tested it with Darshan.
I haven’t tried it, but I believe you should also be able to use NVIDIA Nsight Systems to profile MPI even if you’re not using CUDA: just specify --trace=mpi
option (you will also need to specify the MPI implementation via the --mpi-impl
option).
I’d be keen to hear how people get on with various MPI profilers: if you do have problems (or find solutions to problems), please chime in here or open an issue.