If your project is on GitHub, and if you want a less abstract comment from me, feel free to ping me on the actual code. I can’t promise if I can give good feedback in time but I’m curious to look at challenging examples on parallel and concurrent programming in Julia.
Hope you don’t mind, I’m taking you up on this offer: it’s not clear to me how we’re supposed to work around the case of per-iteration pre-allocated buffers (the buffers[tid] pattern described above). As an example, this is the very innermost loop of our code (called repeatedly), and it’s non-allocating, it’s not clear to me how to do this with @threads :dynamic or with floops: https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/terms/Hamiltonian.jl#L124. Another simpler example (although this one is called once so presumably we could use floops more easily there): https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/densities.jl#L24.