CellListMap.jl: Is it suitable for detecting collision of particles?

liuyxpp · September 20, 2021, 1:36am

@lmiq I am trying to implement a naive algorithm to generate a Self-Avoiding Walk. At present I use following O(N^2) algorithm to check whether two beads are colloiding.

using Random
using CellListMap

function random_vector()
	u, v, w = randn(3)
	norm = sqrt(u*u + v*v + w*w)
	return u/norm, v/norm, w/norm
end

function random_vectors(N)
	x, y, z = zeros(N), zeros(N), zeros(N)
	for i in 1:N
		x[i], y[i], z[i] = random_vector()
	end
	return x, y, z
end

function random_chain(N)
	x, y, z = zeros(N+1), zeros(N+1), zeros(N+1)
	ux, uy, uz = random_vectors(N)
	x[2:end] = cumsum(ux)
	y[2:end] = cumsum(uy)
	z[2:end] = cumsum(uz)
	return Chain(N, x, y, z)
end

distance2(p) = sum(abs2, p)

is_colliding(p1, p2, r) = distance2(p1 .- p2) < 4*r*r ? true : false

struct Chain{T<:Real}
	N::Int
	x::Vector{T}
	y::Vector{T}
	z::Vector{T}
end

function is_real_chain1(chain, radius)
    for i in 1:chain.N
        for j in (i+1):chain.N
			if is_colliding((chain.x[i], chain.y[i], chain.z[i]), (chain.x[j], chain.y[j], chain.z[j]), radius)
				return false
			end
		end
	end
    return true
end

chain_as_vector(chain) = [[chain.x[i], chain.y[i], chain.z[i]] for i in 1:chain.N+1]

is_real_chain2(chainvec, radius) = isempty(CellListMap.neighbourlist(chainvec, 2*radius))

julia> using BenchmarkTools

julia> chain = random_chain(8192)
julia> @btime is_real_chain1($chain, 0.1)
390.710 μs (0 allocations: 0 bytes)
false

chainvec = chain_as_vector(chain)
@btime is_real_chain2($chainvec, 0.1)
84.379 ms (16804 allocations: 354.53 MiB)
false

I try to use your CellListMap.neighbourlist to do the same job. But I found it is much slower. Do you know there is any faster way to find the collision? Thanks!

Edited: MWE example provided.

lmiq · September 20, 2021, 1:48am

Can you share a MWE of a typical calculation? Unless the number of particles is very small it should be much faster. I can help you with that tomorrow.

liuyxpp · September 20, 2021, 1:55am

Thanks! I expect so. I will update my post.

lmiq · September 20, 2021, 2:26pm

Hello. The answer is a little bit more complicated.

Commentary:

The self-avoiding random walk is generally very “low density” (see the figure), thus the overhead of building the cell lists is not necessarily hugely compensated, since most cells will be empty.
The problem here is not detecting collision of particles in general. Is to detect if there is one collision. For detecting all collisions cell lists are absolutely a good choice. For detecting one collision, things are more complicated.

Are overlaps expected readily most of the time?

Your naive algorithm exits on the first encounter of a overlap. That makes it very fast if the “self avoiding” is mostly not self-avoiding. This may not be representative of a true scenario, though, in a simulation. I guess in one such a simulation we would try not to generate many configurations with overlaps just to discard them. So, the benchmark for any random chain may be not the best choice.

Currently, you cannot just abort the calculation of CellListMap when an overlap is found (although I’ve added an issue here, that may be possible to implement as an option).

If overlaps are not expected most of the time:

I have replaced the naive algorithm by a version that effectively runs over all pairs of particles, to simulate the situation in which the most common scenario is that the configuration of the polymer does not have overlaps. In this case the calculation is effectively O(N^2). I have also implemented a slightly faster version of that same algorithm by adding @inbounds and a simple parallelization, to increase the bar.

In this case, the situation is the following: for 8192 we are close to the limit where the naive algorithm and building the cell lists may be one or other faster. You cannot use directly the neighbourlist function, for more than one reason:

You actually don’t need the neighbour list, it is a waste of time building it and storing it.
The cutoff for the overlaps is way too short relative to the size of the system. The number of cells becomes enormous, and even if we only need to compute one integer array of indexes of that size, it is still problematic. Thus, we have to define a cutoff for building the cell lists which is larger than the cutoff of the interactions.

With all that taken into account, if the overlaps are not very common, CellListMap can be faster. For lager walkers it will be much faster.

Conclusion

The most important conclusion is: do most of the configurations will have many overlaps? If they will, the naive algorithm is hard to beat. If not, you may want to consider using CellListMap. In this case, the implementation is not as straightforward as just using the neighbourlist function, but may be worthwhile.

Testing with 8192 beads

---------------------------------------------------------------
Naive1 (if overlaps are found quickly):
---------------------------------------------------------------
BenchmarkTools.Trial: 1000 samples with 1 evaluation.
 Range (min … max):   60.000 ns … 7.276 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     758.458 μs             ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.058 ms ± 1.054 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▆▃▁ ▂  ▁
  ███████▇██▇▇▇▆▅▅▅▆▅▄▆▄▃▄▄▄▃▃▃▃▃▂▃▂▃▂▃▃▃▂▂▂▂▃▂▂▁▁▂▃▂▁▁▂▂▁▂▁▂ ▃
  60 ns           Histogram: frequency by time        5.05 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.
---------------------------------------------------------------
Naive O(N^2):
---------------------------------------------------------------
BenchmarkTools.Trial: 120 samples with 1 evaluation.
 Range (min … max):  38.632 ms … 55.310 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     39.624 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   40.960 ms ±  3.207 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃▆▇█▂▁                                                       
  ██████▆▅▄▃▅▁▃▄▁▄▁▇▁▄▁▁▁▁▁▃▄▃▁▁▁▄▁▁▁▃▁▁▁▁▃▁▁▁▁▁▁▁▁▄▃▁▃▁▁▁▁▁▃ ▃
  38.6 ms         Histogram: frequency by time          53 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.
---------------------------------------------------------------
Naive O(N^2) parallel:
---------------------------------------------------------------
BenchmarkTools.Trial: 320 samples with 1 evaluation.
 Range (min … max):  12.717 ms … 21.009 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     14.609 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   14.940 ms ±  1.477 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▃          █▅  ▂                                           
  ▄▅██▃▁▃▃▃▃▄▅███▅▆█▇▄▄▆▅▅▂▄▁▇▃▂▃▁▂█▃▂▁▃▅▄▃▁▁▃▁▁▁▁▁▁▁▁▁▂▁▁▁▁▂ ▃
  12.7 ms         Histogram: frequency by time        20.2 ms <

 Memory estimate: 3.78 KiB, allocs estimate: 42.
---------------------------------------------------------------
CellListMap (serial):
---------------------------------------------------------------
BenchmarkTools.Trial: 562 samples with 1 evaluation.
 Range (min … max):  5.970 ms … 17.190 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     8.173 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.265 ms ±  1.123 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

          ▂   ▂▂▆█▂▅▄▆▄ █▇▆▂▄   ▅▃▂                           
  ▂▁▃▂▄▄▄▆█▇███████████▇█████▆███████▆▄▇▄▂▄▆▃▂▅▄▂▂▃▄▄▁▁▁▂▁▁▂ ▄
  5.97 ms        Histogram: frequency by time        11.6 ms <

 Memory estimate: 464 bytes, allocs estimate: 6.
---------------------------------------------------------------
CellListMap (parallel):
---------------------------------------------------------------
BenchmarkTools.Trial: 1434 samples with 1 evaluation.
 Range (min … max):  1.656 ms …   6.553 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.566 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.618 ms ± 380.093 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                 ▁▁▃▅▆▅██▅▅█▄▅▄▅▅▁▄▂▁                          
  ▂▁▂▂▃▂▃▃▂▃▄▄▅▆█████████████████████▇▆█▅▆▅▄▃▅▄▅▅▄▃▄▃▂▃▁▂▃▃▂▃ ▅
  1.66 ms         Histogram: frequency by time        3.74 ms <

 Memory estimate: 8.25 KiB, allocs estimate: 88.

Testing for 20_000 beads:

---------------------------------------------------------------
Naive (if overlaps are found quickly):
---------------------------------------------------------------
BenchmarkTools.Trial: 1000 samples with 1 evaluation.
 Range (min … max):   76.000 ns … 7.040 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     656.175 μs             ┊ GC (median):    0.00%
 Time  (mean ± σ):   987.480 μs ± 1.009 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ██▅▅▅▅▁▂▁     
  ███████████▇▆▇▇▆▇▅▄▄▅▅▄▄▄▃▄▄▄▄▃▃▂▂▂▃▃▃▃▃▃▃▂▂▃▃▂▂▂▁▂▂▂▃▂▂▂▂▂ ▄
  76 ns           Histogram: frequency by time        4.64 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.
---------------------------------------------------------------
Naive O(N^2):
---------------------------------------------------------------
BenchmarkTools.Trial: 21 samples with 1 evaluation.
 Range (min … max):  234.876 ms … 268.368 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     246.660 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   247.130 ms ±   7.663 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

           ▃▃          █   █   ▃▃
  ▇▇▁▁▁▁▁▁▁██▁▁▁▇▁▁▇▁▁▁█▁▁▁█▁▁▁██▇▁▁▁▁▁▁▁▁▁▁▇▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▇ ▁
  235 ms           Histogram: frequency by time          268 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.
---------------------------------------------------------------
Naive O(N^2) parallel:
---------------------------------------------------------------
BenchmarkTools.Trial: 53 samples with 1 evaluation.
 Range (min … max):  80.792 ms … 116.894 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     89.168 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   93.729 ms ±   9.145 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

             ▅█▂▅                                ▅
  █▁█▁▁▁▅▁▅██████▅▅▅▅▅▁▁▁▁▁▁▅▁▁███▅▁▅▁▁▁▅▅▁▁▁▁▅▅▁█▅▅▁▁▁▁▁▁▁▅▁▅ ▁
  80.8 ms         Histogram: frequency by time          113 ms <

 Memory estimate: 3.78 KiB, allocs estimate: 42.
---------------------------------------------------------------
CellListMap (serial):
---------------------------------------------------------------
BenchmarkTools.Trial: 516 samples with 1 evaluation.
 Range (min … max):  6.498 ms … 13.965 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     8.748 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   8.988 ms ±  1.311 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

             ▁▄▃▄▂ ▄▄██▃▄▂▂▁▂ ▁ ▁       ▄
  ▃▄▅▄▆▆▆▆███████████████████▅█▅█▇█▆█▆▆▇█▆▅▆▆▅▃▄▃▄▄▃▄▃▄▁▁▃▃▃ ▅
  6.5 ms         Histogram: frequency by time        12.5 ms <

 Memory estimate: 464 bytes, allocs estimate: 6.
---------------------------------------------------------------
CellListMap (parallel):
---------------------------------------------------------------
BenchmarkTools.Trial: 1225 samples with 1 evaluation.
 Range (min … max):  1.735 ms …   9.451 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.997 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.136 ms ± 678.356 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

             ▂▂▂▄▆▆█▆▆▃▂▃▁▁
  ▂▁▂▂▂▃▃▄▄▅▇██████████████▇▇█▆█▇▆▄▄▄▄▄▃▃▂▄▂▃▃▃▃▃▁▂▃▂▁▁▂▁▂▂▁▃ ▄
  1.74 ms         Histogram: frequency by time        5.53 ms <

 Memory estimate: 8.25 KiB, allocs estimate: 88.

The system is not dense at all, most cells will be empty:

The code:

using BenchmarkTools
using Random
using CellListMap
using StaticArrays

function random_vector()
    u, v, w = randn(3)
    norm = sqrt(u*u + v*v + w*w)
    return u/norm, v/norm,       w/norm
end

function random_vectors(N)
    x, y, z = zeros(N), zeros(N),   zeros(N)
    for i in 1:N
        x[i], y[i], z[i] = random_vector()
    end
    return x, y, z
end

function random_chain(N)
    x, y, z = zeros(N+1), zeros(N+  1), zeros(N+1)
    ux, uy, uz = random_vectors(N)
    x[2:end] = cumsum(ux)
    y[2:end] = cumsum(uy)
    z[2:end] = cumsum(uz)
    return Chain(N, x, y, z)
end

distance2(p) = sum(abs2, p)

is_colliding(p1, p2, r) = distance2(p1 .- p2) < 4*r^2 ? true : false

struct Chain{T<:Real}
    N::Int
    x::Vector{T}
    y::Vector{T}
    z::Vector{T}
end

function is_real_chain1(chain, radius)
    for i in 1:chain.N
        for j in (i+1):chain.N
            if is_colliding(
                (chain.x[i], chain.y[i], chain.z[i]), 
                (chain.x[j], chain.y[j], chain.z[j]), 
                radius
            )
                return false
            end
        end
    end
    return true
end

function is_real_chain_ON2(chain, radius)
    real = true
    for i in 1:chain.N
        @inbounds @simd for j in (i+1):chain.N
            if is_colliding(
                (chain.x[i], chain.y[i], chain.z[i]), 
                (chain.x[j], chain.y[j], chain.z[j]), 
                radius
            )
                real = false
            end
        end
    end
    return real
end

function is_real_chain_ON2_parallel(chain, radius)
    real = [ true for _ in Threads.nthreads() ]
    Threads.@threads for i in 1:chain.N
        it = Threads.threadid()
        @inbounds @simd for j in (i+1):chain.N
            if is_colliding(
                (chain.x[i], chain.y[i], chain.z[i]), 
                (chain.x[j], chain.y[j], chain.z[j]), 
                radius
            )
                real[it] = false
            end
        end
    end
    return minimum(real) # returns false if there is any
end

"""

This is how the computation should be implemented, because you don't need
the neighbourlists at all. The box and cell lists will be updated on
every iteration of the "simulation". 

The "cutoff" of the cell list cannot be the same as the cutoff of the
interaction, because this one is too short and this causes the number of 
cells to be enormous. Thus, we have to test within the function 
if the distance is smaller than the radius.  

"""
function is_real_chain_cl_serial(chain,radius,box,cl)
    box = Box(limits(chain),box.cutoff)
    cl = UpdateCellList!(chain,box,cl,parallel=false)
    real = map_pairwise!(
        (x,y,i,j,d2,real) -> ifelse((!real || sqrt(d2) < radius), false, true),
        true, box, cl, parallel=false
    )
    return real
end

function is_real_chain_cl_parallel(chain,radius,box,cl,aux)
    box = Box(limits(chain),box.cutoff)
    cl = UpdateCellList!(chain,box,cl,aux,parallel=true)
    real = map_pairwise!(
        (x,y,i,j,d2,real) -> ifelse((!real || sqrt(d2) < radius), false, true),
        true, box, cl, parallel=true
    )
    return real
end

chain_as_vector(chain) = [SVector{3,Float64}(chain.x[i], chain.y[i], chain.z[i]) for i in 1:chain.N+1]

is_real_chain2(chainvec, radius) = isempty(CellListMap.neighbourlist(chainvec, 2*radius,parallel=false))

display(b) = show(stdout,MIME("text/plain"),b)

function test(N = 8192)

    println("\n---------------------------------------------------------------")
    println("Naive (if overlaps are found quickly):")
    println("---------------------------------------------------------------")
    display(@benchmark is_real_chain1(chain, 0.1) setup=(chain=random_chain(8192)) samples=1000 evals=1)

    println("\n---------------------------------------------------------------")
    println("Naive O(N^2):")
    println("---------------------------------------------------------------")
    display(@benchmark is_real_chain_ON2(chain, 0.1) setup=(chain=random_chain($N)) samples=1000 evals=1)

    println("\n---------------------------------------------------------------")
    println("Naive O(N^2) parallel:")
    println("---------------------------------------------------------------")
    display(@benchmark is_real_chain_ON2_parallel(chain, 0.1) setup=(chain=random_chain($N)) samples=1000 evals=1)

    n_cell = 20 # This parameter can be tuned

    println("\n---------------------------------------------------------------")
    println("CellListMap (serial):")
    println("---------------------------------------------------------------")
    chainvec0 = chain_as_vector(random_chain(8192))
    sizes = limits(chainvec0)
    cell_cutoff = maximum(sizes.limits)/n_cell
    box = Box(sizes,cell_cutoff)
    cl = CellList(chainvec0,box,parallel=false)
    display(@benchmark is_real_chain_cl_serial(chainvec, 0.1, $box, $cl) setup=(chainvec=chain_as_vector(random_chain(8192))))

    println("\n---------------------------------------------------------------")
    println("CellListMap (parallel):")
    println("---------------------------------------------------------------")
    chainvec0 = chain_as_vector(random_chain(8192))
    sizes = limits(chainvec0)
    cell_cutoff = maximum(sizes.limits)/n_cell
    box = Box(sizes,cell_cutoff)
    cl = CellList(chainvec0,box)
    aux = CellListMap.AuxThreaded(cl)
    display(@benchmark is_real_chain_cl_parallel(chainvec, 0.1, $box, $cl, $aux) setup=(chainvec=chain_as_vector(random_chain(8192))))

    nothing
end

liuyxpp · September 21, 2021, 2:33am

Thank you for such a detailed discussion. It seems necessary to use CellListMap when the number of beads is over 10^4, which I shall explore in the next step.

lmiq · September 21, 2021, 11:43am

Yes, in the case where your beads are not overlapping very frequently, it is useful. But using it requires some care. The examples above indicate how you should do it, but please feel free to ask for any help. Each experience in using the package helps me improving its interface and the docs.

Topic		Replies	Views
[ANN] CellListMap.jl: short-ranged particle pairwise computations (lennard-jones, gravitational, etc) Package Announcements	38	2926	June 24, 2022
Using CellListMap.jl within Agents.jl framework Modelling & Simulations question	15	708	July 25, 2022
[ANN] SpatialHashTables.jl Package Announcements package , gpu	8	897	May 20, 2024
Speeding up force calculations and mutable structs Performance	20	1159	August 5, 2020
CellListMap: How do I loop through its neighbour list in a thread-safe way? Modelling & Simulations	6	468	December 29, 2022