Not an answer to your question, but there’s some discussion on (lack of ) speedup here: Again on reaching optimal parallel scaling - #5 by carstenbauer
Not an answer to your question, but there’s some discussion on (lack of ) speedup here: Again on reaching optimal parallel scaling - #5 by carstenbauer