Accelerate pairwise Lennard-Jones force computation

That is out of question. But what do do is this: Fastest way to partition array, given a condition

This takes quite a significant part of the time (but results in some of that locality and avoids a lot of force computations, and results to be very effective). Yet it is the most time-consuming operation except the calculation of the forces for the pairs.