The following code comes from a real-world example with a beginner Julia-programmer calculating the position vector for a scalar field. The second version with the array replaced by a tuple is about 10x faster. This is because in each loop iteration the array causes a memory allocation, which is avoided with the tuple.
function run_it_array()
nx, ny, nz = 100, 200, 300
x = Array{Float64}(undef, 3)
Δx = [3., 5., 2.]
for i=1:nx, j=1:ny, k=1:nz
@. x = [i,j,k] * Δx
end
end
function run_it_tuple()
nx, ny, nz = 100, 200, 300
x = Array{Float64}(undef, 3)
Δx = [3., 5., 2.]
for i=1:nx, j=1:ny, k=1:nz
@. x = (i,j,k) * Δx
end
end
The second is about 10x faster:
julia> @time run_it_array()
0.133853 seconds (6.00 M allocations: 457.764 MiB, 2.62% gc time)
julia> @time run_it_tuple()
0.014502 seconds (2 allocations: 160 bytes)
From a non-computer-science perspective this feels like nuts… “This one weird trick will speed up your code 10x…” (Sorry for the sarcasm; I’m just translating the real-world experience. It gives vibes of the 90s when people were sharing their weirdest C-code pointer-magic.)
So my question: Should Julia be able to optimize the array version? Going one step further, should Julia even be able to pre-allocate x
on its own without requiring the programmer to do it?