Get distances of many points within a max range, any performance improvements?

Jon_Norberg · November 3, 2021, 9:40pm

I have a potentially very long (N) set of points W with three dimensions. I need to make distance based calculations between all points within a set max distance. Below is my best attempt so far. Any performance improvement suggestions?

The benchmark runs in 5.512ms om 1.6 (OSX M1) and in 3.514 ms on 1.7 (OSX M1). Impressive performance gain to 1.7 at least on OSX M1!

using NearestNeighbors, SparseArrays, BenchmarkTools
N=1000
W=rand(Float32,3,N) #points
minDistance=0.1

function getSparse(v,W)
    S=spzeros(Float32,N,N)
    for j in 1:N
        for i in v[j]
            if j<i
                S[i,j]=S[j,i]=sqrt(sum(abs2.(W[:,i].-W[:,j])))
            end
        end
    end
    S
end

@btime begin
    tree=KDTree(W,leafsize=1,reorder=true)
    v=inrange(tree,W,minDistance,true)
    getSparse(v,W)
end

lmiq · November 3, 2021, 9:41pm

Take a look at: CellListMap.jl

Jon_Norberg · November 3, 2021, 9:47pm

That does indeed seem to be what I need…I’ll check it out and compare. If there are any obvious performance improvements by tinkering with the above code I’d appreciate any suggestions for learning purposes. But will report back after some investigation of the suggested package and its src. Thanks!

Jon_Norberg · November 3, 2021, 10:11pm

Tried to run this function from the examples to start:

x = [ rand(3) for i in 1:10_000 ];

julia> CellListMap.neighbourlist(x,0.05)

but get ERROR: UndefVarError: neighbourlist not defined

Then also trying another example:

x = [ rand(3) for i in 1:1000 ];

y = [ rand(3) for i in 1:1000 ];

box = Box([1,1,1],0.02);

cl = CellList(x,y,box)

resulting in ERROR: MethodError: no method matching CellList(::Vector{Vector{Float64}}, ::Vector{Vector{Float64}}, ::Box{3, Float64})

Closest candidates are:

CellList(::AbstractArray{StaticArrays.SVector{N, T}, 1}, ::AbstractArray{StaticArrays.SVector{N, T}, 1}, ::Box; parallel) where {N, T}

The package seems really nice but getting stuck using the examples…

Jon_Norberg · November 3, 2021, 10:15pm

Strangely enough the tests check out tho… just not doing examples in REPL

lmiq · November 3, 2021, 11:02pm

The improvements of that original code I can see are:

set N inside the getSparse function, because there it is a global variable.
Use @views in the function, because the slices allocate new arrays.
add @inbounds to your assignments.

With those modifications your functions takes almost one third of the original time (shown below).

The most simple implementation of that with CellListMap is shown bellow, and is slightly faster in serial.

A working example is shown bellow, with the following results (I am not running parallel versions, because for 1000 points they do not seem to be worthwhile, and because the second options needs a custom reduction function for the sparse array. It will be more easy and useful if you have property to compute from the pairs directly).

julia> include("./norberg.jl")
Original:
  4.944 ms (64724 allocations: 2.09 MiB)
Original - tweaked:
  1.860 ms (4336 allocations: 522.50 KiB)
CellListMap with neighbourlist:
  1.244 ms (1766 allocations: 671.86 KiB)
CellListMap without neighbourlist:
  1.278 ms (1753 allocations: 575.41 KiB)

Code

using NearestNeighbors, SparseArrays, BenchmarkTools
using CellListMap
using StaticArrays
using Test

function getSparse(v,W)
    S=spzeros(Float32,N,N)
    for j in 1:N
        for i in v[j]
            if j<i
                S[i,j]=S[j,i]=sqrt(sum(abs2.(W[:,i].-W[:,j])))
            end
        end
    end
    S
end

@views function getSparse2(v,W)
    N = size(W,2)
    S=spzeros(Float32,N,N)
    for j in 1:N
        for i in v[j]
            if j<i
                @inbounds S[i,j]=S[j,i]=sqrt(sum(abs2.(W[:,i].-W[:,j])))
            end
        end
    end
    S
end

function nn(W,minDistance)
    tree=KDTree(W,leafsize=1,reorder=true)
    v=inrange(tree,W,minDistance,true)
    getSparse(v,W)
end

function nn2(W,minDistance)
    tree=KDTree(W,leafsize=1,reorder=true)
    v=inrange(tree,W,minDistance,true)
    getSparse2(v,W)
end

# Using the CellListMap.neighbourlist function

function cl_nn(W,minDistance)
    W_svector = reinterpret(reshape,SVector{3,Float32},W)
    pairs = CellListMap.neighbourlist(W_svector,minDistance,parallel=false) 
    N = size(W,2)
    S = spzeros(Float32,N,N)
    for (i,j,d) in pairs
        @inbounds S[i,j] = d
        @inbounds S[j,i] = d
    end
    S
end

# Now let us populate the sparse matrix without building the list

function cl_nn2(W,minDistance)
    N = size(W,2)
    S = spzeros(Float32,N,N) 
    W_svector = reinterpret(reshape,SVector{3,Float32},W)
    box = Box(limits(W_svector),minDistance)
    cl = CellList(W_svector,box,parallel=false)
    map_pairwise!(
        (x,y,i,j,d2,S) -> begin
            d = sqrt(d2)
            @inbounds S[i,j] = d
            @inbounds S[j,i] = d
            return S
        end,
        S,
        box, cl, parallel=false
    )
    return S
end

N=1000
W=rand(Float32,3,N) #points
minDistance=0.1

@test nn(W,minDistance) ≈ cl_nn(W,minDistance)
@test nn(W,minDistance) ≈ cl_nn2(W,minDistance)

println("Original:")
@btime nn($W,$minDistance)

println("Original - tweaked:")
@btime nn2($W,$minDistance)

println("CellListMap with neighbourlist:")
@btime cl_nn($W,$minDistance)

println("CellListMap without neighbourlist:")
@btime cl_nn2($W,$minDistance)

nothing

(note: that reinterpret(reshape... is only there because one auxiliary function is not accepting matrices as inputs. I will fix that tomorrow and that will no longer be necessary)

lmiq · November 3, 2021, 11:04pm

You seem to be using a very old version of CellListMap. Can you check that? Current version is 0.6.3

Jon_Norberg · November 3, 2021, 11:15pm

You are right, had an old version even though I just added it.

Now it works and is superfast! Your neighbourlist is running with 461.000 μs vs. my implementation with your suggestions at 1.042 ms on 1.7 OSX M1!

Thanks both for the suggestions for my code and for introducing your package!

lmiq · November 3, 2021, 11:56pm

Just one add on: parallelizing that exact calculation is not useful, because reducing the final sparse matrix takes more than computing which elements are different from zero. But if you want to compute a property that is dependent on the pairwise distances, everything is much faster and parallelization helps a lot. For example, if you wanted to compute the sum of 1/d for all pairs within the cutoff, we can easily go to the hundreds-of-thousands points:

function cl_nn2(W,minDistance;parallel=false)
    N = size(W,2)
    S = 0.
    W_svector = reinterpret(reshape,SVector{3,Float32},W)
    box = Box(limits(W_svector),minDistance)
    cl = CellList(W_svector,box,parallel=parallel)
    S = map_pairwise!(
        (x,y,i,j,d2,S) -> begin
            d = sqrt(d2)
            S += 1 / d
            return S
        end,
        S,
        box, cl,
        parallel = parallel,
    )
    return S
end

which leads to:


julia> N=100000
100000

julia> W=rand(Float32,3,N) #points
3×100000 Matrix{Float32}:
 0.799297  0.866944  0.242355  0.0832249  …  0.709172  0.433691   0.139978  0.696452
 0.25061   0.181402  0.831235  0.754416      0.503823  0.0272224  0.849605  0.0802288
 0.319469  0.345246  0.522793  0.524505      0.417869  0.282098   0.594832  0.762339

julia> @btime cl_nn2($W,$minDistance,parallel=false)
  340.045 ms (3334 allocations: 12.71 MiB)
2.8357939250906736e8

julia> @btime cl_nn2($W,$minDistance,parallel=true)
  70.876 ms (33143 allocations: 31.38 MiB)
2.83579392509103e8

Topic		Replies	Views
Help benchmarking small python code Offtopic python	0	377	August 29, 2021
Accelerating the calculation of minimum distance between points, annotating which is the point Performance	9	977	October 28, 2020
CellListMap: Go from list of pairs to list of indices Performance	8	271	December 26, 2022
Nearest Neighbor Search for points consisting of vectors/arrays Data question	4	1044	May 6, 2020
Cell list algorithm is slower than double for loop Performance	18	2267	April 2, 2020

Get distances of many points within a max range, any performance improvements?

Related topics