This package is an implementation of cell lists for computing properties dependent on the distances between pairs of particles.
Currently the package works with euclidean and three-dimensional systems, and accepts periodic boundary conditions.
The package allows the definition of custom functions to be evaluated pairwise, and with such it is possible, for example, to compute short-ranged particle interactions and neighbor lists, which seems to be something widespread in other fields.
Within the limitations of the type of variable and metric that the package accepts, it seems that be sometimes faster than ball tree NN searches (from NearestNeighbors, for example), and maybe this is useful for someone in another context than that of particle simulations.
For example, below I compute the list of neighbors of a set of particles relative to another set, considering a maximum distance cutoff (inrange
function). There is a custom function for that in NN packages, but here one can implement that on top of the cell list search.
Searching the neighbours of a only a few particles is much faster with the cell lists, because the cost of building the ball tree is too high for a single particle search:
----------------------------------
N1 = 1 ; N2 = 1000000 ; PASS TEST = true
nn: 1.231 s (38 allocations: 67.15 MiB)
cl serial: 414.197 ms (109 allocations: 199.69 MiB)
cl parallel: 389.775 ms (176 allocations: 199.70 MiB)
----------------------------------
N1 = 10 ; N2 = 1000000 ; PASS TEST = true
nn: 1.263 s (139 allocations: 67.40 MiB)
cl serial: 369.966 ms (113 allocations: 198.50 MiB)
cl parallel: 366.570 ms (244 allocations: 198.63 MiB)
----------------------------------
N1 = 100 ; N2 = 1000000 ; PASS TEST = true
nn: 1.194 s (1092 allocations: 69.40 MiB)
cl serial: 378.538 ms (116 allocations: 199.38 MiB)
cl parallel: 365.313 ms (268 allocations: 200.26 MiB)
----------------------------------
N1 = 1000 ; N2 = 1000000 ; PASS TEST = true
nn: 1.252 s (10579 allocations: 85.37 MiB)
cl serial: 476.449 ms (119 allocations: 203.38 MiB)
cl parallel: 384.429 ms (295 allocations: 210.39 MiB)
----------------------------------
N1 = 10000 ; N2 = 1000000 ; PASS TEST = true
nn: 1.659 s (105364 allocations: 260.73 MiB)
cl serial: 1.575 s (123 allocations: 263.38 MiB)
cl parallel: 690.224 ms (326 allocations: 326.39 MiB)
----------------------------------
N1 = 100000 ; N2 = 1000000 ; PASS TEST = true
nn: 5.716 s (1053135 allocations: 1.71 GiB)
cl serial: 12.474 s (126 allocations: 614.62 MiB)
cl parallel: 3.635 s (349 allocations: 1.07 GiB)
These times can change depending on the point distribution and density, number of points and cutoff, and one method or the other may be faster. It can be sometimes much faster, sometimes much slower. The cell lists are particularly useful for roughly homogeneous distributions of points. It scales linearly with the number of points if the density is constant.
Code
using BenchmarkTools
using NearestNeighbors
function nn(points,data,r)
balltree = BallTree(data)
idxs = Int[]
for point in points
append!(idxs,inrange(balltree, point, r, true))
end
return idxs
end
using CellListMap, StaticArrays
function findpairs(x,y,cutoff;parallel=parallel)
# Boundaries
xmin = [ +Inf, +Inf, +Inf ]
xmax = [ -Inf, -Inf, -Inf ]
for v in x
@. xmin = min(xmin,v)
@. xmax = max(xmax,v)
end
for v in y
@. xmin = min(xmin,v)
@. xmax = max(xmax,v)
end
# Define box sides
sides = (xmax - xmin) .+ cutoff
box = Box(sides,cutoff,lcell=5)
# Initialize auxiliary linked lists (largest set!)
cl = CellList(x,y,box)
# Function adds pair to the list
function f(j,d2,cutoff,pairs)
if sqrt(d2) < cutoff
push!(pairs,j)
end
return pairs
end
# We have to define our own reduce function here (for the parallel version)
function reduce_pairs(pairs,pairs_threaded)
pairs = pairs_threaded[1]
for i in 2:Threads.nthreads()
append!(pairs,pairs_threaded[i])
end
return pairs
end
# Initialize
pairs = Int[]
# Run pairwise computation
pairs = map_pairwise!(
(x,y,i,j,d2,pairs) -> f(j,d2,cutoff,pairs),
pairs,box,cl,
reduce=reduce_pairs, parallel=parallel
)
return pairs
end
function naive(x,y,cutoff)
pair_list = Int[]
for vx in x
for (i,vy) in pairs(y)
if CellListMap.distance(vx,vy) < cutoff
push!(pair_list,i)
end
end
end
pair_list
end
using BenchmarkTools, Test
function nn_run()
for N1 in [1, 10, 100, 1_000, 10_000, 100_000]
for N2 in [10^6]
points = [ rand(3) for i in 1:N1 ]
data = rand(3, N2)
r = 0.05
y = [ SVector(data[:,i]...) for i in axes(data,2) ]
x = [ SVector(point...) for point in points ]
list = sort(findpairs(x,y,r,parallel=true))
println("----------------------------------")
print("N1 = $N1 ; N2 = $N2 ; PASS TEST = ")
println(sort(nn(points,data,r)) == list)
print("nn: "); @btime nn($points,$data,$r) samples=1
print("cl serial: "); @btime findpairs($x,$y,$r,parallel=false) samples=1
print("cl parallel: "); @btime findpairs($x,$y,$r,parallel=true) samples=1
end
end
return
end
nothing
For comparison, another good implementation of the same concept is available in the python halotools package, for computing the mean pairwise velocity between galaxies (something like that, not my field). The current implementation of CellListMap.jl
has a good scaling for constant-density systems and parallelizes well. Here are some benchmarks:
Next steps will be implementing more general periodic boundary conditions. Then I will probably move to improving its performance for these small or denser systems.
edit: updated results (v0.5.2)