Hello. The answer is a little bit more complicated.
Commentary:
-
The self-avoiding random walk is generally very βlow densityβ (see the figure), thus the overhead of building the cell lists is not necessarily hugely compensated, since most cells will be empty.
-
The problem here is not detecting collision of particles in general. Is to detect if there is one collision. For detecting all collisions cell lists are absolutely a good choice. For detecting one collision, things are more complicated.
Are overlaps expected readily most of the time?
Your naive algorithm exits on the first encounter of a overlap. That makes it very fast if the βself avoidingβ is mostly not self-avoiding. This may not be representative of a true scenario, though, in a simulation. I guess in one such a simulation we would try not to generate many configurations with overlaps just to discard them. So, the benchmark for any random chain may be not the best choice.
Currently, you cannot just abort the calculation of CellListMap when an overlap is found (although Iβve added an issue here, that may be possible to implement as an option).
If overlaps are not expected most of the time:
I have replaced the naive algorithm by a version that effectively runs over all pairs of particles, to simulate the situation in which the most common scenario is that the configuration of the polymer does not have overlaps. In this case the calculation is effectively O(N^2). I have also implemented a slightly faster version of that same algorithm by adding @inbounds
and a simple parallelization, to increase the bar.
In this case, the situation is the following: for 8192 we are close to the limit where the naive algorithm and building the cell lists may be one or other faster. You cannot use directly the neighbourlist
function, for more than one reason:
- You actually donβt need the neighbour list, it is a waste of time building it and storing it.
- The cutoff for the overlaps is way too short relative to the size of the system. The number of cells becomes enormous, and even if we only need to compute one integer array of indexes of that size, it is still problematic. Thus, we have to define a cutoff for building the cell lists which is larger than the cutoff of the interactions.
With all that taken into account, if the overlaps are not very common, CellListMap can be faster. For lager walkers it will be much faster.
Conclusion
The most important conclusion is: do most of the configurations will have many overlaps? If they will, the naive algorithm is hard to beat. If not, you may want to consider using CellListMap. In this case, the implementation is not as straightforward as just using the neighbourlist
function, but may be worthwhile.
Testing with 8192 beads
---------------------------------------------------------------
Naive1 (if overlaps are found quickly):
---------------------------------------------------------------
BenchmarkTools.Trial: 1000 samples with 1 evaluation.
Range (min β¦ max): 60.000 ns β¦ 7.276 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 758.458 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 1.058 ms Β± 1.054 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
ββββ β β
βββββββββββββββ
β
β
ββ
ββββββββββββββββββββββββββββββββββββββββ β
60 ns Histogram: frequency by time 5.05 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
---------------------------------------------------------------
Naive O(N^2):
---------------------------------------------------------------
BenchmarkTools.Trial: 120 samples with 1 evaluation.
Range (min β¦ max): 38.632 ms β¦ 55.310 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 39.624 ms β GC (median): 0.00%
Time (mean Β± Ο): 40.960 ms Β± 3.207 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
ββββββ
ββββββββ
βββ
ββββββββββββββββββββββββββββββββββββββββββββββββ β
38.6 ms Histogram: frequency by time 53 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
---------------------------------------------------------------
Naive O(N^2) parallel:
---------------------------------------------------------------
BenchmarkTools.Trial: 320 samples with 1 evaluation.
Range (min β¦ max): 12.717 ms β¦ 21.009 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 14.609 ms β GC (median): 0.00%
Time (mean Β± Ο): 14.940 ms Β± 1.477 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
β ββ
β
ββ
ββββββββββ
ββββ
βββββββ
β
βββββββββββββββ
ββββββββββββββββββββ β
12.7 ms Histogram: frequency by time 20.2 ms <
Memory estimate: 3.78 KiB, allocs estimate: 42.
---------------------------------------------------------------
CellListMap (serial):
---------------------------------------------------------------
BenchmarkTools.Trial: 562 samples with 1 evaluation.
Range (min β¦ max): 5.970 ms β¦ 17.190 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 8.173 ms β GC (median): 0.00%
Time (mean Β± Ο): 8.265 ms Β± 1.123 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
β ββββββ
βββ βββββ β
ββ
βββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββ β
5.97 ms Histogram: frequency by time 11.6 ms <
Memory estimate: 464 bytes, allocs estimate: 6.
---------------------------------------------------------------
CellListMap (parallel):
---------------------------------------------------------------
BenchmarkTools.Trial: 1434 samples with 1 evaluation.
Range (min β¦ max): 1.656 ms β¦ 6.553 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 2.566 ms β GC (median): 0.00%
Time (mean Β± Ο): 2.618 ms Β± 380.093 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
ββββ
ββ
βββ
β
βββ
ββ
β
ββββ
βββββββββββββ
ββββββββββββββββββββββββββ
ββ
βββ
ββ
β
ββββββββββββ β
1.66 ms Histogram: frequency by time 3.74 ms <
Memory estimate: 8.25 KiB, allocs estimate: 88.
Testing for 20_000 beads:
---------------------------------------------------------------
Naive (if overlaps are found quickly):
---------------------------------------------------------------
BenchmarkTools.Trial: 1000 samples with 1 evaluation.
Range (min β¦ max): 76.000 ns β¦ 7.040 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 656.175 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 987.480 ΞΌs Β± 1.009 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
βββ
β
β
β
βββ
ββββββββββββββββββ
βββ
β
βββββββββββββββββββββββββββββββββββββ β
76 ns Histogram: frequency by time 4.64 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
---------------------------------------------------------------
Naive O(N^2):
---------------------------------------------------------------
BenchmarkTools.Trial: 21 samples with 1 evaluation.
Range (min β¦ max): 234.876 ms β¦ 268.368 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 246.660 ms β GC (median): 0.00%
Time (mean Β± Ο): 247.130 ms Β± 7.663 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
ββ β β ββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
235 ms Histogram: frequency by time 268 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
---------------------------------------------------------------
Naive O(N^2) parallel:
---------------------------------------------------------------
BenchmarkTools.Trial: 53 samples with 1 evaluation.
Range (min β¦ max): 80.792 ms β¦ 116.894 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 89.168 ms β GC (median): 0.00%
Time (mean Β± Ο): 93.729 ms Β± 9.145 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
β
βββ
β
βββββββ
ββ
βββββββ
β
β
β
β
βββββββ
ββββββ
ββ
ββββ
β
βββββ
β
βββ
β
ββββββββ
ββ
β
80.8 ms Histogram: frequency by time 113 ms <
Memory estimate: 3.78 KiB, allocs estimate: 42.
---------------------------------------------------------------
CellListMap (serial):
---------------------------------------------------------------
BenchmarkTools.Trial: 516 samples with 1 evaluation.
Range (min β¦ max): 6.498 ms β¦ 13.965 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 8.748 ms β GC (median): 0.00%
Time (mean Β± Ο): 8.988 ms Β± 1.311 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
βββββ ββββββββββ β β β
βββ
βββββββββββββββββββββββββ
ββ
βββββββββββ
βββ
ββββββββββββββ β
6.5 ms Histogram: frequency by time 12.5 ms <
Memory estimate: 464 bytes, allocs estimate: 6.
---------------------------------------------------------------
CellListMap (parallel):
---------------------------------------------------------------
BenchmarkTools.Trial: 1225 samples with 1 evaluation.
Range (min β¦ max): 1.735 ms β¦ 9.451 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 2.997 ms β GC (median): 0.00%
Time (mean Β± Ο): 3.136 ms Β± 678.356 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
ββββββββββββββ
ββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββ β
1.74 ms Histogram: frequency by time 5.53 ms <
Memory estimate: 8.25 KiB, allocs estimate: 88.
The system is not dense at all, most cells will be empty:
The code:
using BenchmarkTools
using Random
using CellListMap
using StaticArrays
function random_vector()
u, v, w = randn(3)
norm = sqrt(u*u + v*v + w*w)
return u/norm, v/norm, w/norm
end
function random_vectors(N)
x, y, z = zeros(N), zeros(N), zeros(N)
for i in 1:N
x[i], y[i], z[i] = random_vector()
end
return x, y, z
end
function random_chain(N)
x, y, z = zeros(N+1), zeros(N+ 1), zeros(N+1)
ux, uy, uz = random_vectors(N)
x[2:end] = cumsum(ux)
y[2:end] = cumsum(uy)
z[2:end] = cumsum(uz)
return Chain(N, x, y, z)
end
distance2(p) = sum(abs2, p)
is_colliding(p1, p2, r) = distance2(p1 .- p2) < 4*r^2 ? true : false
struct Chain{T<:Real}
N::Int
x::Vector{T}
y::Vector{T}
z::Vector{T}
end
function is_real_chain1(chain, radius)
for i in 1:chain.N
for j in (i+1):chain.N
if is_colliding(
(chain.x[i], chain.y[i], chain.z[i]),
(chain.x[j], chain.y[j], chain.z[j]),
radius
)
return false
end
end
end
return true
end
function is_real_chain_ON2(chain, radius)
real = true
for i in 1:chain.N
@inbounds @simd for j in (i+1):chain.N
if is_colliding(
(chain.x[i], chain.y[i], chain.z[i]),
(chain.x[j], chain.y[j], chain.z[j]),
radius
)
real = false
end
end
end
return real
end
function is_real_chain_ON2_parallel(chain, radius)
real = [ true for _ in Threads.nthreads() ]
Threads.@threads for i in 1:chain.N
it = Threads.threadid()
@inbounds @simd for j in (i+1):chain.N
if is_colliding(
(chain.x[i], chain.y[i], chain.z[i]),
(chain.x[j], chain.y[j], chain.z[j]),
radius
)
real[it] = false
end
end
end
return minimum(real) # returns false if there is any
end
"""
This is how the computation should be implemented, because you don't need
the neighbourlists at all. The box and cell lists will be updated on
every iteration of the "simulation".
The "cutoff" of the cell list cannot be the same as the cutoff of the
interaction, because this one is too short and this causes the number of
cells to be enormous. Thus, we have to test within the function
if the distance is smaller than the radius.
"""
function is_real_chain_cl_serial(chain,radius,box,cl)
box = Box(limits(chain),box.cutoff)
cl = UpdateCellList!(chain,box,cl,parallel=false)
real = map_pairwise!(
(x,y,i,j,d2,real) -> ifelse((!real || sqrt(d2) < radius), false, true),
true, box, cl, parallel=false
)
return real
end
function is_real_chain_cl_parallel(chain,radius,box,cl,aux)
box = Box(limits(chain),box.cutoff)
cl = UpdateCellList!(chain,box,cl,aux,parallel=true)
real = map_pairwise!(
(x,y,i,j,d2,real) -> ifelse((!real || sqrt(d2) < radius), false, true),
true, box, cl, parallel=true
)
return real
end
chain_as_vector(chain) = [SVector{3,Float64}(chain.x[i], chain.y[i], chain.z[i]) for i in 1:chain.N+1]
is_real_chain2(chainvec, radius) = isempty(CellListMap.neighbourlist(chainvec, 2*radius,parallel=false))
display(b) = show(stdout,MIME("text/plain"),b)
function test(N = 8192)
println("\n---------------------------------------------------------------")
println("Naive (if overlaps are found quickly):")
println("---------------------------------------------------------------")
display(@benchmark is_real_chain1(chain, 0.1) setup=(chain=random_chain(8192)) samples=1000 evals=1)
println("\n---------------------------------------------------------------")
println("Naive O(N^2):")
println("---------------------------------------------------------------")
display(@benchmark is_real_chain_ON2(chain, 0.1) setup=(chain=random_chain($N)) samples=1000 evals=1)
println("\n---------------------------------------------------------------")
println("Naive O(N^2) parallel:")
println("---------------------------------------------------------------")
display(@benchmark is_real_chain_ON2_parallel(chain, 0.1) setup=(chain=random_chain($N)) samples=1000 evals=1)
n_cell = 20 # This parameter can be tuned
println("\n---------------------------------------------------------------")
println("CellListMap (serial):")
println("---------------------------------------------------------------")
chainvec0 = chain_as_vector(random_chain(8192))
sizes = limits(chainvec0)
cell_cutoff = maximum(sizes.limits)/n_cell
box = Box(sizes,cell_cutoff)
cl = CellList(chainvec0,box,parallel=false)
display(@benchmark is_real_chain_cl_serial(chainvec, 0.1, $box, $cl) setup=(chainvec=chain_as_vector(random_chain(8192))))
println("\n---------------------------------------------------------------")
println("CellListMap (parallel):")
println("---------------------------------------------------------------")
chainvec0 = chain_as_vector(random_chain(8192))
sizes = limits(chainvec0)
cell_cutoff = maximum(sizes.limits)/n_cell
box = Box(sizes,cell_cutoff)
cl = CellList(chainvec0,box)
aux = CellListMap.AuxThreaded(cl)
display(@benchmark is_real_chain_cl_parallel(chainvec, 0.1, $box, $cl, $aux) setup=(chainvec=chain_as_vector(random_chain(8192))))
nothing
end