Accelerating the calculation of minimum distance between points, annotating which is the point

lmiq · October 28, 2020, 6:53pm

I am writing a package that needs to compute the minimum distance between sets of coordinates in space of atoms (ComplexMixtures). I am already using cell lists, but at the very end the one finishes with the problem of computing the minimum distance between an atom and a set of other atoms. The inner function is simple:

function minimumdistance(x,y)
  jatom = 0
  dmin = +Inf
  @inbounds for j in axes(y,2)
    d = dsquare(x,@view(y[1:3,j]))
    if d < dmin
      jatom = j
      dmin = d
    end
  end
  return sqrt(dmin), jatom
end

where x is a vector of dimension 3 (coordiantes of one atom) and y is an array of dimension (3,N), the set of coordinates of the other atoms.

Note that I have not only to know the minimum distance between x and the set of atoms in y, but also which atom of y corresponds to that minimum distance.

I have implemented a version of this function using StaticArrays, and that is nice, the code becomes about 30% faster with that.

However, If I try to use LoopVectorizations, it seems that I could get a much greater speedup. I am saying “it seems”, because I am not able to get that because of the if dmin < d ... statement. Without that conditional the code, loop-vectorized, becomes 6 times faster than the faster alternative. However, because of that conditional, I cannot get anything from the loop vectorizations, I simply get:

if d < dmin
    #= /home/leandro/Drive/Work/JuliaPlay/mind.jl:40 =#
    jatom = j
    #= /home/leandro/Drive/Work/JuliaPlay/mind.jl:41 =#
    dmin = d
end
ERROR: LoadError: LoadError: "Don't know how to handle expression."

If anyone has any idea on how to accelerate this code, by using loop vectorizations or anything else, please share .

The code below is a complete running implementation of the options I have been exploring so far. I have taken the care of generating the set y not contiguous in memory, because this is more realistic relative to what I have in my case (copying the data to contiguous vectors is not an option as far I have tested).

In the code below I have commented the conditional that impairs the use of the loop vectorizations, and I get this promising benchmark:

  17.279 ns (0 allocations: 0 bytes)  # common arrays
  12.787 ns (0 allocations: 0 bytes)  # static vectors
  2.027 ns (0 allocations: 0 bytes)  # vectorized, but without conditional

If I comment the conditional on the second function I get the same result as in the third, very likely because the loop gets vectorized automatically by the compiler (certainly is not because two assignments and one comparison of numbers takes 6 times the time of computing the squared distance of two vectors).

Thus, it seems that if that loop could take advantage of vectorization, with the conditional, a significant speedup could be possible.

using BenchmarkTools
using StaticArrays
using LoopVectorization

@inline dsquare(x,y) = (x[1]-y[1])^2 + (x[2]-y[2])^2 + (x[3]-y[3])^2

function minimumdistance(x,y)
  jatom = 0
  dmin = +Inf
  @inbounds for j in axes(y,2)
    d = dsquare(x,@view(y[1:3,j]))
    if d < dmin
      jatom = j
      dmin = d
    end
  end
  return sqrt(dmin), jatom
end

function minimumdistance_static(x,y)
  jatom = 0
  dmin = +Inf
  @inbounds for j in eachindex(y)
    d = dsquare(x,y[j])
    if d < dmin
      jatom = j
      dmin = d
    end
  end
  return sqrt(dmin), jatom
end

function minimumdistance_loopvec(x,y)
  jatom = 0
  dmin = +Inf
  @avx for j in eachindex(y)
    d = dsquare(x,y[j])
    #if d < dmin
    #  jatom = j
    #  dmin = d
    #end
  end
  return sqrt(dmin), jatom
end

# "All" 1000 atomic coordinates
x = rand(3) ; yall = rand(3,1000);

# Random subsets (such that the coordinates are not contiguous)
N = 5
ylist = rand(1:1000,N)
y = @view(yall[1:3,ylist])

# Alternativelly, we could have arrays of SVectors:
sx = SVector(x...)
syall = [ SVector(yall[1:3,i]...) for i in 1:size(yall,2) ]

# Subset of the same N atoms of y
sy = @view(syall[ylist])

@btime minimumdistance($x,$y)
@btime minimumdistance_static($sx,$sy)
@btime minimumdistance_loopvec($sx,$sy)

DNF · October 28, 2020, 7:29pm

You are getting tricked by the compiler. Try benchmarking minimumdistance_loopvec with syall. The loop is getting elided, because it doesn’t affect the outcome of the computation. It is basically the same as writing

function minimumdistance_loopvec(x,y)
    jatom = 0
    dmin = +Inf
    return sqrt(dmin), jatom
end

You can see that

julia> @btime sqrt(Inf)
  2.310 ns (0 allocations: 0 bytes)
Inf

You need to get the loop to calculate something to get a real benchmark.

DNF · October 28, 2020, 7:33pm

If you change your function to this, you get a better idea about the possible performance:

function minimumdistance_loopvec(x,y)
    jatom = 0
    dmin = +Inf
    @avx for j in eachindex(y)
        d = dsquare(x,y[j])
        dmin = min(dmin, d)
    end
    return sqrt(dmin), jatom
end

This will give you the wrong index, but actually runs the inner loop.

Benchmark:

julia> @btime minimumdistance_loopvec($sx,$sy)
  12.926 ns (0 allocations: 0 bytes)

Very close to the other timings with StaticArrays.

mcabbott · October 28, 2020, 7:39pm

I think taking a view is slowing things down:

julia> @btime minimumdistance($x,$y)
  22.236 ns (0 allocations: 0 bytes)
(0.26799890077104815, 1)

julia> @btime minimumdistance_static($sx,$sy)
  14.224 ns (0 allocations: 0 bytes)
(0.26799890077104815, 1)

julia> sy_get = syall[ylist];

julia> @btime minimumdistance_static($sx,$sy_get)
  9.488 ns (0 allocations: 0 bytes)
(0.3771984138347722, 4)

julia> y_get = yall[1:3,ylist];

julia> @btime minimumdistance($x,$y_get)
  13.722 ns (0 allocations: 0 bytes)
(0.3771984138347722, 4)

Edit – I thought I had an @avx version, but was mistaken.

LoopVectorization.check_args(sy) === false, so any use of it with that input is going to give the fallback which IIRC is just @inbounds @fastmath.

To get more out of it, I think you may want permutedims(y) so that it might vectorise along the j index.

DNF · October 28, 2020, 7:40pm

BTW, this is more complicated than necessary. Try this

syall = reinterpret(SVector{3, eltype(yall)}, yall);

It’s 150,000 times faster, and requires zero allocations. (well, I guess it basically takes zero time, more or less.)

DNF · October 28, 2020, 7:48pm

jatom = ifelse(d<dmin,  j, jatom)
dmin = min(dmin, d)

This could be replaced with

(dmin, jatom) = ifelse(d < dmin, (d, j), (dmin, jatom))

No performance difference, though.

Skoffer · October 28, 2020, 8:07pm

There is a trick which can remove conditions, but it makes everything slower

function minimumdistance5(x,y)
  jatom = 0
  dmin = 1000000000.0
  @avx for j in eachindex(y)
    d = dsquare(x, y[j])
    b = Int(d < dmin)
    jatom = b * j + (1 - b) * jatom
    dmin = b * d + (1 - b) * dmin
  end
  return sqrt(dmin), jatom
end

There were issues with the multiplication of Inf and Bool so I made some ugly changes. It can be done better, but since it runs slower it doesn’t matter anyway.

But, it looks like changing of @inbounds to @inbounds @simd in minimumdistance_static give small increase, in my calculations I see speedup from 13.463 ns to 12.672.

DNF · October 28, 2020, 8:25pm

This should work:

b = d < dmin
jatom = b * j + !b * jatom
dmin = b * d + !b * dmin

But if you just want a really big float that isn’t Inf, you can use

julia> prevfloat(Inf)
1.7976931348623157e308

Skoffer · October 28, 2020, 8:29pm

prevfloat is nice! It’s good to know, thank you very much.

But anyway, this approach is too slow.

lmiq · October 28, 2020, 8:38pm

of course… I hate myself…

Certainly, but that is on purpose. That reflects the kind of data I get into the function in the package. I cannot avoid feeding it with a non-contiguous array there (except if I was willing to copy a lot of data before calling it, but my tests show that that is not an option).

Good to know about that (but that is only the way I am generating the data in this MWE. The actual data is generated otherwise.

Topic		Replies	Views
Is it possible to make this function faster? Performance question	11	762	May 7, 2020
Get distances of many points within a max range, any performance improvements? Performance	8	401	November 3, 2021
Improving performance of mapping algorithm Performance question	17	917	April 28, 2020
Matrix shortest distance problem - Need suggestions to improve the performance Optimization (Mathematical) array	27	1601	March 16, 2021
Getting the benchmarked speedup from Distances.jl General Usage performance	8	559	September 7, 2020

Accelerating the calculation of minimum distance between points, annotating which is the point

Related topics