Accelerate pairwise Lennard-Jones force computation

Yes, absolutely. One needs to avoid at maximum having to compute the squared distance between particles. What I have implemented so far allows defining the size of the cells (cutoff, cutoff/2, cutoff/3), which reduce the number of distances computed at the expense of running over more cells, and I have also implemented a scheme in which one projects the coordinates of the particles along the vector connecting the cell centers, sort the particles along this direction and compute the distances only for those pairs whose projections are closer than the cutoff. This all reduces significantly the number of distances that one has to compute.

That implemented, now it seems that my package spends 60% of the time computing the remaining forces. Therefore, I would had to remove everything else to get closer to what NAMD obtains. I can only imagine, for now, that NAMD is able to align the force computation such that SIMD instructions are take place very effectively. I will try to now to build lists of particles, pad the vectors as suggested by Elrod, and “turbo” the force computation to see how faster that gets, but all attempts I made so far didn’t pay of the price of having to build the list.

In a shared memory computer the scaling on CPUs is not much worse than NAMD’s (I bet on multiple computers it will be much, much, worse, since NAMD is specialized for that).

Of course, for practical speedup, there is the path of putting everything on the GPU. Initially I thought that that would be very difficult, but now the package already does not allocate anything and the kernel are all custom kernels anyway, maybe it is easier than what I initially anticipated.

1 Like