Does Julia do automatic multithreading or parallelization if it is possible? For example, I have code in Fortran and in Julia. For a smaller problem, I get faster performance in Fortran. But as I increase the problem size, Julia gives me much better performance than Fortran. Both codes are same and a lot of “for loops” can be parallelized. As of now, I am just having a serial version in both Julia and Fortran.

No - all multithreading in pure Julia is single threaded by default. Called libraries may be using multithreading themselves though (like LAPACK and BLAS/MKL, as far as I know).

You may be experiencing good performance because of vectorization though. We can’t tell you why exactly without the source code though.

Below is the portion of the code where calculation is done.
‘’’
for iteration_count = 1:maximum_iterations

# calculate ∇^2(residual)
for j = 2:ny for i = 2:nx
del_residual[i,j] = (residual[i+1,j] - 2*residual[i,j] +
residual[i-1,j])/dx^2 +
(residual[i,j+1] - 2*residual[i,j] +
residual[i,j-1])/dy^2
end end
aa = 0.0
bb = 0.0
# calculate aa, bb, cc. cc is the distance parameter(α_n)
for j = 2:ny for i = 2:nx
aa = aa + residual[i,j]*residual[i,j]
bb = bb + del_residual[i,j]*residual[i,j]
end end
cc = aa/(bb + tiny)
# update the numerical solution by adding some component of residual
for j = 1:ny for i = 1:nx
u_numerical[i,j] = u_numerical[i,j] + cc * residual[i,j]
end end
# update the residual by removiing some component of previous residual
for j = 1:ny for i = 1:nx
residual[i,j] = residual[i,j] - cc * del_residual[i,j]
end end
# compute the l2norm of residual
rms = compute_l2norm(nx, ny, residual)
println(iteration_count, " ", rms/initial_rms)
if (rms/initial_rms) <= tolerance
break
end
end