[ANN] CellListMap.jl: short-ranged particle pairwise computations (lennard-jones, gravitational, etc)

lmiq · June 8, 2021, 3:09pm

This package is an implementation of cell lists for computing properties dependent on the distances between pairs of particles.

Currently the package works with euclidean and three-dimensional systems, and accepts periodic boundary conditions.

The package allows the definition of custom functions to be evaluated pairwise, and with such it is possible, for example, to compute short-ranged particle interactions and neighbor lists, which seems to be something widespread in other fields.

Within the limitations of the type of variable and metric that the package accepts, it seems that be sometimes faster than ball tree NN searches (from NearestNeighbors, for example), and maybe this is useful for someone in another context than that of particle simulations.

For example, below I compute the list of neighbors of a set of particles relative to another set, considering a maximum distance cutoff (inrange function). There is a custom function for that in NN packages, but here one can implement that on top of the cell list search.

Searching the neighbours of a only a few particles is much faster with the cell lists, because the cost of building the ball tree is too high for a single particle search:

----------------------------------
N1 = 1 ; N2 = 1000000 ; PASS TEST = true
nn:   1.231 s (38 allocations: 67.15 MiB)
cl serial:   414.197 ms (109 allocations: 199.69 MiB)
cl parallel:   389.775 ms (176 allocations: 199.70 MiB)
----------------------------------
N1 = 10 ; N2 = 1000000 ; PASS TEST = true
nn:   1.263 s (139 allocations: 67.40 MiB)
cl serial:   369.966 ms (113 allocations: 198.50 MiB)
cl parallel:   366.570 ms (244 allocations: 198.63 MiB)
----------------------------------
N1 = 100 ; N2 = 1000000 ; PASS TEST = true
nn:   1.194 s (1092 allocations: 69.40 MiB)
cl serial:   378.538 ms (116 allocations: 199.38 MiB)
cl parallel:   365.313 ms (268 allocations: 200.26 MiB)
----------------------------------
N1 = 1000 ; N2 = 1000000 ; PASS TEST = true
nn:   1.252 s (10579 allocations: 85.37 MiB)
cl serial:   476.449 ms (119 allocations: 203.38 MiB)
cl parallel:   384.429 ms (295 allocations: 210.39 MiB)
----------------------------------
N1 = 10000 ; N2 = 1000000 ; PASS TEST = true
nn:   1.659 s (105364 allocations: 260.73 MiB)
cl serial:   1.575 s (123 allocations: 263.38 MiB)
cl parallel:   690.224 ms (326 allocations: 326.39 MiB)
----------------------------------
N1 = 100000 ; N2 = 1000000 ; PASS TEST = true
nn:   5.716 s (1053135 allocations: 1.71 GiB)
cl serial:   12.474 s (126 allocations: 614.62 MiB)
cl parallel:   3.635 s (349 allocations: 1.07 GiB)

These times can change depending on the point distribution and density, number of points and cutoff, and one method or the other may be faster. It can be sometimes much faster, sometimes much slower. The cell lists are particularly useful for roughly homogeneous distributions of points. It scales linearly with the number of points if the density is constant.

Code

using BenchmarkTools

using NearestNeighbors

function nn(points,data,r)
  balltree = BallTree(data)
  idxs = Int[]
  for point in points
    append!(idxs,inrange(balltree, point, r, true))
  end
  return idxs
end

using CellListMap, StaticArrays

function findpairs(x,y,cutoff;parallel=parallel)

  # Boundaries
  xmin = [ +Inf, +Inf, +Inf ]
  xmax = [ -Inf, -Inf, -Inf ]
  for v in x
    @. xmin = min(xmin,v)
    @. xmax = max(xmax,v)
  end
  for v in y
    @. xmin = min(xmin,v)
    @. xmax = max(xmax,v)
  end

  # Define box sides
  sides = (xmax - xmin) .+ cutoff
  box = Box(sides,cutoff,lcell=5)

  # Initialize auxiliary linked lists (largest set!)
  cl = CellList(x,y,box)

  # Function adds pair to the list
  function f(j,d2,cutoff,pairs) 
    if sqrt(d2) < cutoff
      push!(pairs,j)
    end
    return pairs
  end

  # We have to define our own reduce function here (for the parallel version)
  function reduce_pairs(pairs,pairs_threaded)
    pairs = pairs_threaded[1]
    for i in 2:Threads.nthreads()
      append!(pairs,pairs_threaded[i])
    end
    return pairs
  end

  # Initialize
  pairs = Int[]

  # Run pairwise computation
  pairs = map_pairwise!(
    (x,y,i,j,d2,pairs) -> f(j,d2,cutoff,pairs),
    pairs,box,cl,
    reduce=reduce_pairs, parallel=parallel
  )
  return pairs

end

function naive(x,y,cutoff)
  pair_list = Int[]
  for vx in x
    for (i,vy) in pairs(y) 
      if CellListMap.distance(vx,vy) < cutoff  
        push!(pair_list,i)
      end
    end
  end
  pair_list
end

using BenchmarkTools, Test

function nn_run()

  for N1 in [1, 10, 100, 1_000, 10_000, 100_000]
    for N2 in [10^6]

       points = [ rand(3) for i in 1:N1 ]
       data = rand(3, N2)
       r = 0.05
       
       y = [ SVector(data[:,i]...) for i in axes(data,2) ]
       x = [ SVector(point...) for point in points  ]
       
       list = sort(findpairs(x,y,r,parallel=true)) 
       
       println("----------------------------------")
       print("N1 = $N1 ; N2 = $N2 ; PASS TEST = ")
       println(sort(nn(points,data,r)) == list)
       print("nn: "); @btime nn($points,$data,$r) samples=1 
       print("cl serial: "); @btime findpairs($x,$y,$r,parallel=false) samples=1
       print("cl parallel: "); @btime findpairs($x,$y,$r,parallel=true) samples=1

    end
  end

  return 
end

nothing

For comparison, another good implementation of the same concept is available in the python halotools package, for computing the mean pairwise velocity between galaxies (something like that, not my field). The current implementation of CellListMap.jl has a good scaling for constant-density systems and parallelizes well. Here are some benchmarks:

Next steps will be implementing more general periodic boundary conditions. Then I will probably move to improving its performance for these small or denser systems.

edit: updated results (v0.5.2)

Raf · June 8, 2021, 5:06pm

Cool! I’m writing an agent model component for DynamicGrids.jl, that could be useful.

How much do you think will be usable inside a GPU kernel?

lmiq · June 8, 2021, 5:15pm

I have absolutely no experience with programming for GPUs, but almost certainly the algorithm can adapted for that (since the best molecular dynamics simulations codes run on GPUs and this is a subset of what they do). I am interested in that possibility, but I will have to start learning first.

Raf · June 8, 2021, 5:25pm

No worries. I’ve been looking though the code and wikipedia example. It seems like what I need, but just the kernel algorithm really… It will have to be hacked a lot to fit DynamicGrids and GPUs. But I’ll use your code as a reference.

lmiq · June 8, 2021, 5:36pm

Please let me know when/if you do that, so I can see what you did and port it back

Raf · June 8, 2021, 6:28pm

For sure! It will be integrated with this - so probably pretty convoluted… DynamicGrids.jl/agents.jl at agents · cesaraustralia/DynamicGrids.jl · GitHub

Hopefully this idea will work with it… it’s organised around tracking cell counts for doing radix style agent sorts.

lmiq · June 18, 2021, 1:52am

Updated some results and example API syntax for a new version (0.4.1).

jgreener64 · June 18, 2021, 10:58am

Looks like a really useful package! I’d like to use something like this in Molly.jl in the future.

lmiq · June 18, 2021, 12:01pm

Indeed, that could improve the performance of this loop. In a simulation probably we need a smart way to update the lists without having to compute them all again, considering the fact that most of the atoms will only move slightly from one step to the next.

(my main goal with this package is to provide a performant pairwise calculation for ComplexMixtures.jl and for a new and improved Julia version of Packmol, which is under way. In these cases the “particles don’t move too much from step to step” do not apply, so I need to always update them.)

ps: combining this package with the force-field and topology tools of Molly.jl allows one to write very nice custom analysis functions for MD simulations. I will certainly explore that.

jgreener64 · June 18, 2021, 1:11pm

Yes, I agree. Cell lists are a key missing feature in Molly.jl. I am moving to a position this autumn which will let me focus on Molly full time, hopefully it will progress quickly after that and become useful for serious work. The topology tools are barebones at the minute but I’m hoping to depend on Chemfiles for that.

I can see a nice molecular simulation ecosystem starting to emerge over the next few years. I am excited for a Julia version of Packmol, that could become a crucial package.

lmiq · June 18, 2021, 1:19pm

the goal here will be not just better performance and scaling, and arbitrary periodic boundary conditions (I could do that in Fortran), but to provide an interface in which the user can provide its own geometric spacial restrictions to the molecules (as functions), such that systems spacial molecular distributions of any kind can be built. I think in the next years we will see more and MD systems approaching the sizes of cellular organizations, and being able to build initial configurations for those will be important.

jgreener64 · June 18, 2021, 1:28pm

Sounds cool, and could open up research directions using autodiff to tune those functions to give certain resultant properties.

SteffenPL · June 18, 2021, 2:24pm

I implemented something similar, maybe I can try to commit to your project
(But first I have to try it out )

Since I am dealing with a very small number of particles, it was very important to be allocation free which I got using StaticArrays for the internals and a weird allocation free data structure for the lists (page 4 in this document: http://www.acclab.helsinki.fi/~aakurone/atomistiset/lecturenotes/lecture03_2up.pdf).

My interface was a bit different, I am using

X = rand(2,100)
vl = VerletList{2}(bottomleft=[0.0, 0.0], topright=[1.,1.]), R=0.2)
  
update_lists!(vl, X)
for (i,j) in VerletPairs(vl)
    # do stuff
end

If you want I can prepare the code and share it, I need to remove a few things to have a minimal example which is usable.

lmiq · June 18, 2021, 2:53pm

Nice. Do you use periodic boundary conditions as well? This is crucial for my use case. I only allocate on the construction of the cell lists (for the first time, they can be updated afterwards without allocations).

ps: I just updated the benchmarks above for the just sent to release version 0.4.3, which is in par with the halotools implementation for denser systems.

SteffenPL · June 18, 2021, 2:59pm

Cool!
No, my code doesn’t have periodic boundary conditions, it’s a quite classical implementation. Probably the only interesting thing for your project could be the iterator. Since that is a bit tricky to implement (at least it was for me)

I will benchmark it on the weekend and compare with yours. I’m exited to try if the threaded version gives a speedup (I don’t have time to do it right now )

lmiq · August 4, 2021, 1:06am

Updated original post according to version 0.5.2.

News:

General periodic boundary conditions are accepted (although I still have to improve on the input format of the unit cell matrix, which is somewhat restricted at this point).
Better scaling for dense systems.

Details at: https://github.com/m3g/CellListMap.jl

jaantollander · August 8, 2021, 5:48am

I have been working on a similar package that solves the Fixed Radius Near Neighbors problem with Cell Lists called CellLists.jl. The package contains serial and multithreaded algorithms. Also, I linked a couple of blog posts I wrote about theoretical aspects and benchmarks for the algorithms. It could be interesting to compare the implementations. Internally, my implementation relies on a dictionary for the cell list. It is rather a simple algorithm, but Julia makes it quite efficient. Any thoughts?

lmiq · August 8, 2021, 10:53am

My intuition would say that using dictionaries will be slow here, but let’s try them. Of course that depends on how fast the package needs to be for the application in question.

Ideally one would like to keep the data as aligned in memory as possible, to use simd and gpus at full potential (there are some limitations on how well that can work in my package since the function to be computed is provided by the user and may not be amenable to vectorization - maybe this is the first place where I encounter really the need for some metaprogramming?). I still fall short on that and I’m already planning a way to reestructure the package with that in mind.

cortner · August 8, 2021, 7:07pm

Just fir the record, Here is my own : https://github.com/JuliaMolSim/NeighbourLists.jl

It is fairly well debugged and works on general cells and with or without PBC. In some revision I must have killed the performance though. My plan is to retire is in favour of one of the new packages once the abstract t interfaces emerge. But can still be useful right now at least for comparing and debugging if nothing else.

lmiq · August 8, 2021, 7:14pm

what exactly do you mean by this?

Topic		Replies	Views
Accelerate pairwise Lennard-Jones force computation Performance	87	3306	February 12, 2022
CellListMap.jl: Is it suitable for detecting collision of particles? General Usage celllistmap	5	611	September 21, 2021
Pairwise computation slower than Python (Cython) code (BallTree very slow!) Performance question , tree	27	2064	June 18, 2021
[ANN] SpatialHashTables.jl Package Announcements package , gpu	8	920	May 20, 2024
Cell list algorithm is slower than double for loop Performance	18	2273	April 2, 2020

[ANN] CellListMap.jl: short-ranged particle pairwise computations (lennard-jones, gravitational, etc)

Related topics