# Alternative to computation in a loop for distances for fast performance

Hello Everyone,
I want to compute the distance (SqEuclidean) of a new point (newP) from the existing neighboring points in a specific order. I have the following code:

``````newP=[x y z]
for i in eachindex(order)
Dist[i]=sum((Points[order[i], :]' - newP) .^ 2)
end
``````

However, the above piece of code comparatively takes too much time and I would like to speed up this computation. Is there some way I can optimize this loop?
Thank you.

Please make an MWE and clarify what you mean by too much time.

Some potential low-hanging fruit could be removing allocations, using StaticArrays for your point vectors, and avoiding access to variables in global scope, but all of this is hard to tell without a working example.

1 Like

Here is an MWE

``````Points=rand(Float64, (6700, 3))
newP=[0.2 0.3 0.4]
order=collect(1:6700) #order can be different but for simple example I use regular sequence here
Dist=zeros(size(order,1),1)
for i in eachindex(order)
Dist[i]=sum((Points[order[i], :]' - newP) .^ 2)
end
``````

I want to find Dist (distances) for a lot of newP so I want to speed up this computation in for loop (or rather not use for loop for fast performance)

The page on performance tips is very important.

I put the code you’d like to time in a function, so that it can be timed.

Making the identifiers (variables) `const`, increases performance in this case by a factor of ten.

Using `@inbounds` improves performance further by a few percent

``````const Points = rand(Float64, (6700, 3))

const newP = [0.2 0.3 0.4]

const order = collect(1:6700) #order can be different but for simple example I use regular sequence here

const Dist = zeros(size(order,1),1)

function fillDist()
@inbounds for i in eachindex(order)
Dist[i]=sum((Points[order[i], :]' - newP) .^ 2)
end
end
``````

You can get another factor of two or more this way:

``````const newP2 = [0.2, 0.3, 0.4]

function fillDist2()
for i in eachindex(order)
Dist[i]=sum((Points[order[i], :] .- newP2) .^ 2)
end
end
``````

Here, I have avoided doing a transpose. Also the `.-` and `.^` are “fused” into a single loop, resulting in fewer allocations.

You get another factor of two by replacing the line in the loop by

`````` Dist[i]=sum(( @view(Points[order[i], :]) .- newP2) .^ 2)
``````
6 Likes

And instead of using excessive global variables, maybe pass them explicitly to the function.

4 Likes

Some ways to do this are:

``````pairwise2(x::AbstractMatrix, y::AbstractMatrix) = sum(abs2, x; dims=1)' .+ sum(abs2, y; dims=1) .- 2 .* (x'*y)  # written to use columns
Dist ≈ pairwise2(Points[order, :]', newP')

using Distances
Dist ≈ pairwise(SqEuclidean(), Points[order, :], newP; dims=1)  # dims=1 means rows here

using Tullio
Dist ≈ @tullio dist_t[i] := (Points[order[i], k] - newP[k])^2
``````

Times:

``````julia> using BenchmarkTools

julia> @btime for i in eachindex(\$order)
\$Dist[i] = sum((\$Points[\$order[i], :]' - \$newP) .^ 2)
end
min 514.084 μs, mean 577.836 μs (20100 allocations, 1.53 MiB)

julia> @btime pairwise2(\$Points[\$order, :]', \$newP');
min 31.833 μs, mean 64.153 μs (11 allocations, 471.55 KiB)

julia> @btime pairwise(SqEuclidean(), \$Points[\$order, :], \$newP; dims=1)
min 27.459 μs, mean 45.095 μs (7 allocations, 262.02 KiB)

julia> @btime @tullio dist_t[i] := (\$Points[\$order[i], k] - \$newP[k])^2;
min 20.000 μs, mean 24.760 μs (2 allocations, 52.42 KiB)

julia> using LoopVectorization

julia> @btime @tullio dist_t[i] := (\$Points[\$order[i], k] - \$newP[k])^2;
min 13.375 μs, mean 17.343 μs (2 allocations, 52.42 KiB)
``````
7 Likes

@mcabbott Hello Michael,
I am not able to implement the script using Tullio. When I run this code, I get the output as false.

``````Points=rand(Float64, (6700, 3))
newP=[0.2 0.3 0.4]
order=collect(1:6700)
Dist=zeros(size(order,1),1)
using Tullio
Dist ≈ @tullio dist_t[i] := (Points[order[i], k] - newP[k])^2
``````

A few points:

Julia Arrays are column major, so make sure to work along columns, not rows:

``````Points=rand(Float64, (3, 6700))
Points[:, order[i]]
``````

``````Dist = zeros(size(order, 1))
I think you’re testing here whether zero (from two lines up) is approximately equal to a result. My code above ought to return `true` if you have already run yours – it check that each method is getting the same result as your loop wrote into `Dist`.