Julia code becomes slower on running on supercomputers and does not scale well when parallelizing with Base.Threads

Not to put too fine a point on it but this was exactly what you were being told in the first thread you opened on this:

The performance of parallel code can change drastically depending on all sorts of optimizations you can make to its simple serial execution, so you really need to make sure your code is as efficient (and allocation-free) as possible before you run elaborate experiments on supercomputers.

4 Likes