Julia code becomes slower on running on supercomputers and does not scale well when parallelizing with Base.Threads

This post may be useful to you:
https://viralinstruction.com/posts/hardware/

Essentially the problem here is not memory access but memory management. There is also the matter of caching. Your processsors keep a copy of a small amount of memory nearby for really fast access. If you are constantly changing what memory you are using you cannot effectively take advatnage of caching.