# Accelerating the calculation of minimum distance between points, annotating which is the point

I am writing a package that needs to compute the minimum distance between sets of coordinates in space of atoms (ComplexMixtures). I am already using cell lists, but at the very end the one finishes with the problem of computing the minimum distance between an atom and a set of other atoms. The inner function is simple:

``````function minimumdistance(x,y)
jatom = 0
dmin = +Inf
@inbounds for j in axes(y,2)
d = dsquare(x,@view(y[1:3,j]))
if d < dmin
jatom = j
dmin = d
end
end
return sqrt(dmin), jatom
end
``````

where `x` is a vector of dimension 3 (coordiantes of one atom) and `y `is an array of dimension `(3,N)`, the set of coordinates of the other atoms.

Note that I have not only to know the minimum distance between `x` and the set of atoms in `y`, but also which atom of `y` corresponds to that minimum distance.

I have implemented a version of this function using StaticArrays, and that is nice, the code becomes about 30% faster with that.

However, If I try to use LoopVectorizations, it seems that I could get a much greater speedup. I am saying â€śit seemsâ€ť, because I am not able to get that because of the `if dmin < d ... ` statement. Without that conditional the code, loop-vectorized, becomes 6 times faster than the faster alternative. However, because of that conditional, I cannot get anything from the loop vectorizations, I simply get:

``````if d < dmin
#= /home/leandro/Drive/Work/JuliaPlay/mind.jl:40 =#
jatom = j
#= /home/leandro/Drive/Work/JuliaPlay/mind.jl:41 =#
dmin = d
end
``````

If anyone has any idea on how to accelerate this code, by using loop vectorizations or anything else, please share .

The code below is a complete running implementation of the options I have been exploring so far. I have taken the care of generating the set `y` not contiguous in memory, because this is more realistic relative to what I have in my case (copying the data to contiguous vectors is not an option as far I have tested).

In the code below I have commented the conditional that impairs the use of the loop vectorizations, and I get this promising benchmark:

``````  17.279 ns (0 allocations: 0 bytes)  # common arrays
12.787 ns (0 allocations: 0 bytes)  # static vectors
2.027 ns (0 allocations: 0 bytes)  # vectorized, but without conditional
``````

If I comment the conditional on the second function I get the same result as in the third, very likely because the loop gets vectorized automatically by the compiler (certainly is not because two assignments and one comparison of numbers takes 6 times the time of computing the squared distance of two vectors).

Thus, it seems that if that loop could take advantage of vectorization, with the conditional, a significant speedup could be possible.

``````using BenchmarkTools
using StaticArrays
using LoopVectorization

@inline dsquare(x,y) = (x[1]-y[1])^2 + (x[2]-y[2])^2 + (x[3]-y[3])^2

function minimumdistance(x,y)
jatom = 0
dmin = +Inf
@inbounds for j in axes(y,2)
d = dsquare(x,@view(y[1:3,j]))
if d < dmin
jatom = j
dmin = d
end
end
return sqrt(dmin), jatom
end

function minimumdistance_static(x,y)
jatom = 0
dmin = +Inf
@inbounds for j in eachindex(y)
d = dsquare(x,y[j])
if d < dmin
jatom = j
dmin = d
end
end
return sqrt(dmin), jatom
end

function minimumdistance_loopvec(x,y)
jatom = 0
dmin = +Inf
@avx for j in eachindex(y)
d = dsquare(x,y[j])
#if d < dmin
#  jatom = j
#  dmin = d
#end
end
return sqrt(dmin), jatom
end

# "All" 1000 atomic coordinates
x = rand(3) ; yall = rand(3,1000);

# Random subsets (such that the coordinates are not contiguous)
N = 5
ylist = rand(1:1000,N)
y = @view(yall[1:3,ylist])

# Alternativelly, we could have arrays of SVectors:
sx = SVector(x...)
syall = [ SVector(yall[1:3,i]...) for i in 1:size(yall,2) ]

# Subset of the same N atoms of y
sy = @view(syall[ylist])

@btime minimumdistance(\$x,\$y)
@btime minimumdistance_static(\$sx,\$sy)
@btime minimumdistance_loopvec(\$sx,\$sy)

``````
1 Like

You are getting tricked by the compiler. Try benchmarking `minimumdistance_loopvec` with `syall`. The loop is getting elided, because it doesnâ€™t affect the outcome of the computation. It is basically the same as writing

``````function minimumdistance_loopvec(x,y)
jatom = 0
dmin = +Inf
return sqrt(dmin), jatom
end
``````

You can see that

``````julia> @btime sqrt(Inf)
2.310 ns (0 allocations: 0 bytes)
Inf
``````

You need to get the loop to calculate something to get a real benchmark.

2 Likes

If you change your function to this, you get a better idea about the possible performance:

``````function minimumdistance_loopvec(x,y)
jatom = 0
dmin = +Inf
@avx for j in eachindex(y)
d = dsquare(x,y[j])
dmin = min(dmin, d)
end
return sqrt(dmin), jatom
end
``````

This will give you the wrong index, but actually runs the inner loop.

Benchmark:

``````julia> @btime minimumdistance_loopvec(\$sx,\$sy)
12.926 ns (0 allocations: 0 bytes)
``````

Very close to the other timings with StaticArrays.

1 Like

I think taking a view is slowing things down:

``````julia> @btime minimumdistance(\$x,\$y)
22.236 ns (0 allocations: 0 bytes)
(0.26799890077104815, 1)

julia> @btime minimumdistance_static(\$sx,\$sy)
14.224 ns (0 allocations: 0 bytes)
(0.26799890077104815, 1)

julia> sy_get = syall[ylist];

julia> @btime minimumdistance_static(\$sx,\$sy_get)
9.488 ns (0 allocations: 0 bytes)
(0.3771984138347722, 4)

julia> y_get = yall[1:3,ylist];

julia> @btime minimumdistance(\$x,\$y_get)
13.722 ns (0 allocations: 0 bytes)
(0.3771984138347722, 4)
``````

Edit â€“ I thought I had an `@avx` version, but was mistaken.

`LoopVectorization.check_args(sy) === false`, so any use of it with that input is going to give the fallback which IIRC is just `@inbounds @fastmath`.

To get more out of it, I think you may want `permutedims(y)` so that it might vectorise along the `j` index.

BTW, this is more complicated than necessary. Try this

``````syall = reinterpret(SVector{3, eltype(yall)}, yall);
``````

Itâ€™s 150,000 times faster, and requires zero allocations. (well, I guess it basically takes zero time, more or less.)

1 Like
``````jatom = ifelse(d<dmin,  j, jatom)
dmin = min(dmin, d)
``````

This could be replaced with

``````(dmin, jatom) = ifelse(d < dmin, (d, j), (dmin, jatom))
``````

No performance difference, though.

1 Like

There is a trick which can remove conditions, but it makes everything slower

``````function minimumdistance5(x,y)
jatom = 0
dmin = 1000000000.0
@avx for j in eachindex(y)
d = dsquare(x, y[j])
b = Int(d < dmin)
jatom = b * j + (1 - b) * jatom
dmin = b * d + (1 - b) * dmin
end
return sqrt(dmin), jatom
end
``````

There were issues with the multiplication of `Inf` and `Bool` so I made some ugly changes. It can be done better, but since it runs slower it doesnâ€™t matter anyway.

But, it looks like changing of `@inbounds` to `@inbounds @simd` in `minimumdistance_static` give small increase, in my calculations I see speedup from 13.463 ns to 12.672.

This should work:

``````b = d < dmin
jatom = b * j + !b * jatom
dmin = b * d + !b * dmin
``````

But if you just want a really big float that isnâ€™t `Inf`, you can use

``````julia> prevfloat(Inf)
1.7976931348623157e308
``````
3 Likes

`prevfloat` is nice! Itâ€™s good to know, thank you very much.

But anyway, this approach is too slow.

of courseâ€¦ I hate myselfâ€¦

Certainly, but that is on purpose. That reflects the kind of data I get into the function in the package. I cannot avoid feeding it with a non-contiguous array there (except if I was willing to copy a lot of data before calling it, but my tests show that that is not an option).

Good to know about that (but that is only the way I am generating the data in this MWE. The actual data is generated otherwise.