Yup, just putting the Julia code in a function (and fixing the loop order) makes it more than 30x faster:
function doit!(T, Tₖ₋₁=copy(T); tolerance=1e-6)
m, n = size(T)
error = 1.0
while error > tolerance
Tₖ₋₁ .= T
for j = 2:m-1, i = 2:n-1
T[i, j] = 1/4*(T[i,j+1] + T[i,j-1] + T[i+1,j] + T[i-1,j])
end
error = maximum(abs.(Tₖ₋₁ .- T))
end
return T
end
But then you can make another 2x faster by completely eliminating the Tₖ₋₁
array and a couple of other tricks:
function doit2!(T; tolerance=1e-6)
m, n = size(T)
while true
error = zero(eltype(T))
@inbounds for j = 2:m-1, i = 2:n-1
Tᵢⱼ = T[i, j]
T[i, j] = 1/4*(T[i,j+1] + T[i,j-1] + T[i+1,j] + T[i-1,j])
error = max(error, abs(T[i, j] - Tᵢⱼ))
end
error ≤ tolerance && break
end
return T
end
and I get another factor of 3 by using the LoopVectorization package:
using LoopVectorization
function doit3!(T; tolerance=1e-6)
m, n = size(T)
while true
error = zero(eltype(T))
@turbo for j = 2:m-1, i = 2:n-1
Tᵢⱼ = T[i, j]
T[i, j] = 1/4*(T[i,j+1] + T[i,j-1] + T[i+1,j] + T[i-1,j])
error = max(error, abs(T[i, j] - Tᵢⱼ))
end
error ≤ tolerance && break
end
return T
end
I would benchmark with something like:
using BenchmarkTools
@btime doit3!($T; tolerance=1e-6) setup=(T[2:n-1,2:n-1] .= 0);
to get accurate numbers. (If you use the more primitive @time
, be sure to run the benchmark a couple of times at least, since the first run you are measuring compilation time.)
- Moral of this story: the first Julia code you write is almost always quite slow, especially if you haven’t yet grokked the performance tips. But you can usually make it much faster using only localized tweaks (as opposed to throwing your code in the trash and rewriting it in Fortran).
(Of course, the code can be made faster yet because Gauss–Seidel is a terrible way to solve Laplace’s equation, but that would be “cheating”.)