Julia code becomes slower on running on supercomputers and does not scale well when parallelizing with Base.Threads

I shall try my best to do this and provide the result (as stated above, I do not have direct access to the cluster in question). However, the slowdown occurs even when I am comparing the serial codes. Besides, when running for 20000 points, the Julia code comes of as about 5 minutes slower than the Fortran code (the Fortran code isn’t scaling perfectly either), so I suspect that this is the result of the accumulated serial code slowdown over multiple iterations.