[ANN] CellListMap.jl: short-ranged particle pairwise computations (nearest neighbor etc)

I am announcing this package mostly because it might be useful outside the original purpose for which I implemented it, and also because feedback will be mostly welcome, as this will be part of an important project I am developing. The package is an implementation of cell lists for computing properties dependent on the distances between pairs of particles.

Currently the package works with euclidean and three-dimensional systems, and accepts orthorhombic periodic boundary conditions. I will surely be extending it to general PBCs and, probably, to generalized dimensions as well (not sure about other distance metrics, I don’t see any application of that in my field at the moment).

The package allows the definition of custom functions to be evaluated pairwse, and with such it is possible, for example, to compute neighbor lists, which seems to be something widespread in other fields.

Within the limitations of the type of variable and metric that the package accepts, it seems that be sometimes faster than ball tree NN searches (from NearestNeighbors, for example), and maybe this is useful for someone in another context than that of particle simulations.

For example, below I compute the list of neighbours of a set of particles relative to another set, considering a maximum distance cutoff (inrange function). There is a custom function for that in NN packages, but here one can implement that on top of the cell list search.

Searching the neighbours of a only a few particles is much faster with the cell lists, because the cost of building the ball tree is too high for a single particle search:

N1 = 1 N2 = 1000000
nn:   1.265 s (39 allocations: 67.16 MiB)
CL serial:   50.415 ms (65 allocations: 22.91 MiB)
CL parallel:   24.160 ms (262 allocations: 22.93 MiB)
naive (N1*N2):   2.600 ms (10 allocations: 16.39 KiB)

For performing many similar searches the fixed cost of course will be diluted, but both work well (parallelization of the NN method here is trivial as well):

N1 = 1000 N2 = 1000000
nn:   1.363 s (10566 allocations: 84.68 MiB)
CL serial:   357.926 ms (74 allocations: 27.89 MiB)
CL parallel:   81.272 ms (379 allocations: 34.92 MiB)
naive (N1*N2):   3.035 s (19 allocations: 5.00 MiB)

These times can change depending on the point distribution, number of points and cutoff, and one method or the other may be faster. It can be sometimes much faster, sometimes much slower.

using NearestNeighbors

function nn(points,data,r)
  balltree = BallTree(data)
  return inrange(balltree, points, r, true)

using CellListMap, StaticArrays

function findpairs(x,y,cutoff;parallel=true)

  N1 = length(x)
  N2 = length(y)

  # Boundaries
  xmin = [ +Inf, +Inf, +Inf ]
  xmax = [ -Inf, -Inf, -Inf ]
  for v in x
    @. xmin = min(xmin,v)
    @. xmax = max(xmax,v)
  for v in y
    @. xmin = min(xmin,v)
    @. xmax = max(xmax,v)

  # Define box sides
  sides = (xmax - xmin) .+ cutoff
  box = Box(sides,cutoff)

  # Initialize auxiliary linked lists (largest set!)
  lc = LinkedLists(N2)

  # Initializing linked cells with these positions (largest set!)

  # Function adds pair to the list
  function f(j,d2,cutoff,pairs) 
    if sqrt(d2) < cutoff
    return pairs

  # We have to define our own reduce function here (for the parallel version)
  function reduce_pairs(pairs,pairs_threaded)
    pairs = pairs_threaded[1]
    for i in 2:Threads.nthreads()
    return pairs

  # Initialize
  pairs = Int[]

  # Run pairwise computation
  pairs = map_pairwise!(
    (x,y,i,j,d2,pairs) -> f(j,d2,cutoff,pairs),
    reduce=reduce_pairs, parallel=parallel
  return pairs


function naive(x,y,cutoff)
  pair_list = Int[]
  for vx in x
    for (i,vy) in pairs(y) 
      if CellListMap.distance(vx,vy) < cutoff  

using BenchmarkTools, Test

N1 = 1000; N2 = 1_000_000
points = rand(3,N1)
data = rand(3, N2)
r = 0.05

x = [ SVector(points[:,i]...) for i in axes(points,2) ]
y = [ SVector(data[:,i]...) for i in axes(data,2) ]

@test sort(vcat(nn(points,data,r)...)) == sort(findpairs(x,y,r,parallel=false))
#@test sort(findpairs(points,y,r)) == sort(naive(points,y,r))

println("N1 = $N1 N2 = $N2")
print("nn: "); @btime nn($points,$data,$r)
print("CL serial: "); @btime findpairs($x,$y,$r,parallel=false) 
print("CL parallel: "); @btime findpairs($x,$y,$r,parallel=true) 
#print("naive (N1*N2): "); @btime naive($x,$y,$r) 



Cool! I’m writing an agent model component for DynamicGrids.jl, that could be useful.

How much do you think will be usable inside a GPU kernel?

1 Like

I have absolutely no experience with programming for GPUs, but almost certainly the algorithm can adapted for that (since the best molecular dynamics simulations codes run on GPUs and this is a subset of what they do). I am interested in that possibility, but I will have to start learning first.

No worries. I’ve been looking though the code and wikipedia example. It seems like what I need, but just the kernel algorithm really… It will have to be hacked a lot to fit DynamicGrids and GPUs. But I’ll use your code as a reference.

1 Like

Please let me know when/if you do that, so I can see what you did and port it back :slight_smile:

For sure! It will be integrated with this - so probably pretty convoluted… DynamicGrids.jl/agents.jl at agents · cesaraustralia/DynamicGrids.jl · GitHub

Hopefully this idea will work with it… it’s organised around tracking cell counts for doing radix style agent sorts.