I saw this mentioned in the OpenMPI mailing list. It sounds pretty interesting.
Is there any relevance to Julia?

Hello John, I am unfamiliar with Julia but I could try to elaborate on what I am doing with Toro unikernel and OpenMPI. I am building an implementation of MPI that relies on Toro unikernel, a kernel that is optimized to run parallel threads. The architecture of the kernel is optimized to share nothing between the different instances of the kernel that run in each core. This is a PoC for the moment, I am trying different MPI APIs like reduce to see if effectively the performance is improved. The MPI application is compiled within the kernel and deployed either in a VM or in bare-metal.