I am writing a package that needs to compute the minimum distance between sets of coordinates in space of atoms (ComplexMixtures). I am already using cell lists, but at the very end the one finishes with the problem of computing the minimum distance between an atom and a set of other atoms. The inner function is simple:
function minimumdistance(x,y) jatom = 0 dmin = +Inf @inbounds for j in axes(y,2) d = dsquare(x,@view(y[1:3,j])) if d < dmin jatom = j dmin = d end end return sqrt(dmin), jatom end
x is a vector of dimension 3 (coordiantes of one atom) and
y is an array of dimension
(3,N), the set of coordinates of the other atoms.
Note that I have not only to know the minimum distance between
x and the set of atoms in
y, but also which atom of
y corresponds to that minimum distance.
I have implemented a version of this function using StaticArrays, and that is nice, the code becomes about 30% faster with that.
However, If I try to use LoopVectorizations, it seems that I could get a much greater speedup. I am saying “it seems”, because I am not able to get that because of the
if dmin < d ... statement. Without that conditional the code, loop-vectorized, becomes 6 times faster than the faster alternative. However, because of that conditional, I cannot get anything from the loop vectorizations, I simply get:
if d < dmin #= /home/leandro/Drive/Work/JuliaPlay/mind.jl:40 =# jatom = j #= /home/leandro/Drive/Work/JuliaPlay/mind.jl:41 =# dmin = d end ERROR: LoadError: LoadError: "Don't know how to handle expression."
If anyone has any idea on how to accelerate this code, by using loop vectorizations or anything else, please share .
The code below is a complete running implementation of the options I have been exploring so far. I have taken the care of generating the set
y not contiguous in memory, because this is more realistic relative to what I have in my case (copying the data to contiguous vectors is not an option as far I have tested).
In the code below I have commented the conditional that impairs the use of the loop vectorizations, and I get this promising benchmark:
17.279 ns (0 allocations: 0 bytes) # common arrays 12.787 ns (0 allocations: 0 bytes) # static vectors 2.027 ns (0 allocations: 0 bytes) # vectorized, but without conditional
If I comment the conditional on the second function I get the same result as in the third, very likely because the loop gets vectorized automatically by the compiler (certainly is not because two assignments and one comparison of numbers takes 6 times the time of computing the squared distance of two vectors).
Thus, it seems that if that loop could take advantage of vectorization, with the conditional, a significant speedup could be possible.
using BenchmarkTools using StaticArrays using LoopVectorization @inline dsquare(x,y) = (x-y)^2 + (x-y)^2 + (x-y)^2 function minimumdistance(x,y) jatom = 0 dmin = +Inf @inbounds for j in axes(y,2) d = dsquare(x,@view(y[1:3,j])) if d < dmin jatom = j dmin = d end end return sqrt(dmin), jatom end function minimumdistance_static(x,y) jatom = 0 dmin = +Inf @inbounds for j in eachindex(y) d = dsquare(x,y[j]) if d < dmin jatom = j dmin = d end end return sqrt(dmin), jatom end function minimumdistance_loopvec(x,y) jatom = 0 dmin = +Inf @avx for j in eachindex(y) d = dsquare(x,y[j]) #if d < dmin # jatom = j # dmin = d #end end return sqrt(dmin), jatom end # "All" 1000 atomic coordinates x = rand(3) ; yall = rand(3,1000); # Random subsets (such that the coordinates are not contiguous) N = 5 ylist = rand(1:1000,N) y = @view(yall[1:3,ylist]) # Alternativelly, we could have arrays of SVectors: sx = SVector(x...) syall = [ SVector(yall[1:3,i]...) for i in 1:size(yall,2) ] # Subset of the same N atoms of y sy = @view(syall[ylist]) @btime minimumdistance($x,$y) @btime minimumdistance_static($sx,$sy) @btime minimumdistance_loopvec($sx,$sy)