One more thing, which may be useful (but it wouldn’t give much since it’s O(n) and your bottleneck is O(n^2)), you can remove
@inbounds for i = 1:nCoords
u[i] = u_tmp[i];
pg[i] = pg_tmp[i];
end
and instead do binding outside of PackStep
function, something like this:
for _ in 1:number_of_steps
PackStep(pg, pg_tmp, u, u_tmp, nCoords, nTot)
pg, pg_tmp = pg_tmp, pg
u, u_tmp = u_tmp, u
end
In this case it will be binding, not copying, so you can squeeze few additional nanoseconds.
P.S.: for nearest neighbours may be this thread will be relevant Cell list algorithm is slower than double for loop