Am not sure what to make of that remark.
FFT primarily for the Poisson solver (or GW if you want to do GW), that is certainly true. But that is usually done with MPI (because usually you overflow a single nodes memory at some point even ignoring cores per node and speed) by distributing explicitely one fft dimension across nodes, so not hidden in a library like a MPI parallel fftw.
On top of course a Hamiltonian matrix must be diagonalized, so usually you start thinking about parallel block Davidson, LOPCG and similar (on top of fast BLAS/Lapack), eventually based on blacs/ScaLapack (which is not the right toolbox) or homegrown. And that certainly is not a fft.
So again my question: what would be a technically viable parallelization strategy with julia for typical electronic structure theory given the current approach of the language to parallelization and the current ecosystem ? Does anyone know ?
Because I do not see it. Parallelization appears to be generally more of an afterthought at the moment (ok, that is true for most other languages, too) but I did not hear of any (large scale) codes using julia and serious MPI for either linear algebra or fft where you spend much time using a high performance network (e.g. aries, infiniband, omnipath) and its software stack underneath.
What I see in the documentation is some basically thread parallel approach or spawning of independent workers. Has anyone experience with MPI.jl ? Is it ready/ stable/ performant ?