In simple the example below, I write two functions that compute “the force” between the particles of a set. To compute the forces between each pair of particles one needs to compute the difference in position of the two particles (the dx
vector).
I have implemented two versions of the minimal example: one in which the dx
vector is just computed naively, the other in which it is computed as a static array. There is no other difference in the codes.
As expected, the static array version does not allocate, and is faster. Yet, that left me a bit unsatisfied. The vector dx
is local to the inner iteration of the loop. I had the expectation that the compiler would figure that out and stack-allocate it as well in the “naive” case. It does not, and the difference in performance is huge.
Forces dx naive: 23.923 ms (499500 allocations: 53.35 MiB)
Forces dx static: 4.397 ms (0 allocations: 0 bytes)
Here is the code:
Code
using BenchmarkTools
function forces_naive!(x,f)
n = length(x)
for i in 1:n-1
for j in i+1:n
dx = x[i] - x[j] # naive, allocates
f[i] .= f[i] .+ dx
f[j] .= f[j] .- dx
end
end
end
using StaticArrays
function forces_static!(x,f)
n = length(x)
for i in 1:n-1
for j in i+1:n
dx = SVector{3,Float64}(x[i][1] - x[j][1],
x[i][2] - x[j][2],
x[i][3] - x[j][3])
f[i] .= f[i] .+ dx
f[j] .= f[j] .- dx
end
end
end
N = 1000
f = [ zeros(3) for i in 1:N ]
x = [ rand(3) for i in 1:N ]
print("Forces dx naive: ");@btime forces_naive!($x,$f)
print("Forces dx static: ");@btime forces_static!($x,$f)
My more general point is: is it unrealistic to expect that the compiler does something better with that vector, even if one does not explicitly say that it is static? Isn’t this a case where one could expect that?
(as a provocative remark, I think this occurs in a code compiled with the @jit
flag in some python implementation of a similar thing - that is where this comes from - not that I know the details of the python version, but while we can get a faster Julia version using static arrays, tentative “naive” implementations like this turn out to be faster in the jit compiled python version, and very slow in the Julia version, and I think this is one of the reasons).
The more specific question is: is there a prettier syntax to write this?
dx = SVector{3,Float64}(x[i][1] - x[j][1],
x[i][2] - x[j][2],
x[i][3] - x[j][3])
I mean, it wouldn’t be bad that, for example @SVector x[i]-x[j]
got interpreted as that, for example.