Julia code becomes slower on running on supercomputers and does not scale well when parallelizing with Base.Threads

@lmiq As you rightly pointed out in your first reply, the primary focus should be to optimize the Julia code as much as possible. We have tried to do that as described in this thread here. We have significantly improved the performance of the serial Julia code. On observing that the Julia code is running faster than the serial Fortran code on i7 x86 architechture, we concluded that we have reached the end of “general programming optimization” , that we might have done without rewriting the matrix formalism that we used.