Thanks. The upper loop where that is inserted is optimized in many ways. I am sad that I could not use simd instructions, because in practice the indexes of the particles are discontinuous in memory (because I’m using cell lists). On that perspective the problem is more similar to this one (I simplified a little bit the problem removing some constants):
using StaticArrays, FastPow, BenchmarkTools
function forces!(x,f,d,idxs)
@inbounds for id in axes(idxs,1)
i = idxs[id,1]
j = idxs[id,2]
r = x[j] - x[i]
@fastpow dudr = -12*(1/d^7 - 1/d^4)*r
f[i] = f[i] + dudr
f[j] = f[j] - dudr
end
return f
end
@benchmark forces!(x,f,d,idxs) setup = (
x = [ rand(SVector{3,Float64}) for _ in 1:1000 ];
f = [ zeros(SVector{3,Float64}) for _ in 1:1000 ];
d = rand();
idxs = rand(1:1000,20,2)
)
I don’t get any speedup from simd, turbo, etc, because of the discontinuity of the memory positions accessed. I have tried to copy the data before doing that computation, but was never worth the effort (and the size of the loop shown there, 20, is typical, so also limited in terms of taking advantage of simd).