Alternative to computation in a loop for distances for fast performance

Hello Everyone,
I want to compute the distance (SqEuclidean) of a new point (newP) from the existing neighboring points in a specific order. I have the following code:

newP=[x y z]
for i in eachindex(order)
	Dist[i]=sum((Points[order[i], :]' - newP) .^ 2)
end

However, the above piece of code comparatively takes too much time and I would like to speed up this computation. Is there some way I can optimize this loop?
Thank you.

Please make an MWE and clarify what you mean by too much time.

Some potential low-hanging fruit could be removing allocations, using StaticArrays for your point vectors, and avoiding access to variables in global scope, but all of this is hard to tell without a working example.

1 Like

Here is an MWE

Points=rand(Float64, (6700, 3))
newP=[0.2 0.3 0.4]
order=collect(1:6700) #order can be different but for simple example I use regular sequence here
Dist=zeros(size(order,1),1)
for i in eachindex(order)
	Dist[i]=sum((Points[order[i], :]' - newP) .^ 2)
end

I want to find Dist (distances) for a lot of newP so I want to speed up this computation in for loop (or rather not use for loop for fast performance)

The page on performance tips is very important.

I put the code you’d like to time in a function, so that it can be timed.

Making the identifiers (variables) const, increases performance in this case by a factor of ten.

Using @inbounds improves performance further by a few percent

const Points = rand(Float64, (6700, 3))

const newP = [0.2 0.3 0.4]

const order = collect(1:6700) #order can be different but for simple example I use regular sequence here

const Dist = zeros(size(order,1),1)

function fillDist()
    @inbounds for i in eachindex(order)
	Dist[i]=sum((Points[order[i], :]' - newP) .^ 2)
    end
end

You can get another factor of two or more this way:

const newP2 = [0.2, 0.3, 0.4]

function fillDist2()
    for i in eachindex(order)
	Dist[i]=sum((Points[order[i], :] .- newP2) .^ 2)
    end
end

Here, I have avoided doing a transpose. Also the .- and .^ are “fused” into a single loop, resulting in fewer allocations.

You get another factor of two by replacing the line in the loop by

 Dist[i]=sum(( @view(Points[order[i], :]) .- newP2) .^ 2)
6 Likes

And instead of using excessive global variables, maybe pass them explicitly to the function.

4 Likes

Some ways to do this are:

pairwise2(x::AbstractMatrix, y::AbstractMatrix) = sum(abs2, x; dims=1)' .+ sum(abs2, y; dims=1) .- 2 .* (x'*y)  # written to use columns
Dist ≈ pairwise2(Points[order, :]', newP')

using Distances
Dist ≈ pairwise(SqEuclidean(), Points[order, :], newP; dims=1)  # dims=1 means rows here

using Tullio
Dist ≈ @tullio dist_t[i] := (Points[order[i], k] - newP[k])^2

Times:

julia> using BenchmarkTools

julia> @btime for i in eachindex($order)
               $Dist[i] = sum(($Points[$order[i], :]' - $newP) .^ 2)
       end
  min 514.084 μs, mean 577.836 μs (20100 allocations, 1.53 MiB)

julia> @btime pairwise2($Points[$order, :]', $newP');
  min 31.833 μs, mean 64.153 μs (11 allocations, 471.55 KiB)

julia> @btime pairwise(SqEuclidean(), $Points[$order, :], $newP; dims=1)
  min 27.459 μs, mean 45.095 μs (7 allocations, 262.02 KiB)

julia> @btime @tullio dist_t[i] := ($Points[$order[i], k] - $newP[k])^2;
  min 20.000 μs, mean 24.760 μs (2 allocations, 52.42 KiB)

julia> using LoopVectorization

julia> @btime @tullio dist_t[i] := ($Points[$order[i], k] - $newP[k])^2;
  min 13.375 μs, mean 17.343 μs (2 allocations, 52.42 KiB)
7 Likes

@mcabbott Hello Michael,
Thank you for your response.
I am not able to implement the script using Tullio. When I run this code, I get the output as false.

Points=rand(Float64, (6700, 3))
newP=[0.2 0.3 0.4]
order=collect(1:6700) 
Dist=zeros(size(order,1),1)
using Tullio
Dist ≈ @tullio dist_t[i] := (Points[order[i], k] - newP[k])^2

I am sorry but I don’t understand. Could you please help?

A few points:

Julia Arrays are column major, so make sure to work along columns, not rows:

Points=rand(Float64, (3, 6700))
Points[:, order[i]]

Wrap your code in functions(!)

A matter of style: it is good to distinguish between matrices and vectors, so instead of

write

Dist = zeros(size(order, 1))
1 Like

I think you’re testing here whether zero (from two lines up) is approximately equal to a result. My code above ought to return true if you have already run yours – it check that each method is getting the same result as your loop wrote into Dist.