It is done that way already (the package can be used to just compute the cell lists, but if you compute something else, like a force, that computation is mapped into the pairs of particles without “saving” the pairs anywhere).
What is built in advance is the cell list, but that is O(n).
My initial point here is that even if the list is built (thus abstracting all the work involved in running only over the close particles), NAMD is still faster. I think the leads I have now are the padding and the locality.
There is one detail, however, that is that I cannot avoid a branch. This is now the most inner loop looks like:
for j in 1:n
@inbounds pⱼ = cellⱼ.particles[pp[j].index_in_cell]
xpⱼ = pⱼ.coordinates
d2 = norm_sqr(xpᵢ - xpⱼ)
if d2 <= cutoff_sq
output = f(xpᵢ, xpⱼ, pᵢ.index, pⱼ.index, d2, output)
end
end
Here I’m sitting on particle i
and running over particles j
. In the first line inside the loop the coordinates of particle j
are fetched, and then I compute the squared distance. The most inner function has to be computed only if d2 <= cutoff_sq
. This branch I think (?) impairs good vectorization to take place.
I could avoid this branch using the c = d2 <= cutoff
trick for some specific case, but the idea of the package is to allow that the function f(...)
be any user defined function, so I cannot anticipate what is going to happen inside it.
For the LJ case, f(...)
is the function that compute the pairwise-forces.
(by the way, I think now that in terms of locality my code is already pretty fine, because of the way I structured it with some other things in mind).
Add on: I am now structuring the coordinates, internally, in:
struct ParticleWithIndex{N,T}
index::Int
coordinates::SVector{N,T}
real::Bool
end
because I need the coordinates, the index, and the real
attribute. Maybe this makes it harder to align computations? Would it be better if these properties where in separate arrays?