Compute pairwise distances (Distances.jl)

Some attempts… note that I’ve made these all write into an existing array, as the times are otherwise very variable, for me, I guess that allocating a whole GB each run stresses my computer:

julia> out = zeros(10^4, 10^4);

julia> @btime pairwise!($out, Euclidean(), $arr, $arr, dims=2); # original, but in-place
  675.518 ms (4 allocations: 156.41 KiB)

julia> @btime pairwise!($out, Euclidean(), $arr, $arr2, dims=2); # with a copy of array
  311.783 ms (4 allocations: 156.41 KiB)

julia> using Tullio

julia> out ≈ @tullio out2[i,j] := (arr[μ,i] - arr[μ,j])^2 |> sqrt
true

julia> @btime @tullio $out[i,j] = ($arr[μ,i] - $arr[μ,j])^2 |> sqrt;
  171.028 ms (50 allocations: 4.81 KiB)

julia> using LoopVectorization

julia> @btime @tullio $out[i,j] = ($arr[μ,i] - $arr[μ,j])^2 |> sqrt;
  90.907 ms (50 allocations: 4.81 KiB)

julia> @btime @tullio $out[i,j] = sqrt(($arr[1,i] - $arr[1,j])^2 + ($arr[2,i] - $arr[2,j])^2 + ($arr[3,i] - $arr[3,j])^2);
  79.451 ms (44 allocations: 4.86 KiB)

This is multi-threaded (2 cores) which might count as cheating.

Also, note that transposing the problem makes it faster, at least when using LoopVectorization. It likes having a dimension longer than 3 to vectorise along:

julia> arr_t = permutedims(arr);

julia> @btime @tullio $out[i,j] = ($arr_t[i,μ] - $arr_t[j,μ])^2 |> sqrt;
  68.236 ms (50 allocations: 4.81 KiB)

What these versions can’t do is iterate only over half the space. For comparison, in-place versions of @lmiq’s SVector code above, on my computer:

julia> function mydistmat_self_symm!(out, x)
         n = length(x)
         for i in 1:n # faster without @inbounds, weirdly?
           for j in 1:i
             out[j,i] = d3(x[i],x[j])
           end
         end
         Symmetric(out)
       end;

julia> d3(x,y) = sqrt((x[1]-y[1])^2+ (x[2]-y[2])^2+ (x[3]-y[3])^2); # called d above

julia> x = rand(SVector{3,Float64},10_000);

julia> mydistmat_self_symm!(out, x) ≈ pairwise(Euclidean(), reduce(hcat,x), reduce(hcat,x), dims=2)
true

julia> @btime mydistmat_self_symm!($out, $x);
  87.738 ms (0 allocations: 0 bytes)

julia> @btime mydistmat_self_symm_threads!($out, $x); # version with Threads.@threads, but it's not dividing the triangle well
  54.583 ms (24 allocations: 5.22 KiB)

julia> @btime mydistmat_self!($out, $x); # fills the whole array, no Symmetric
  176.911 ms (0 allocations: 0 bytes)

julia> @btime mydistmat_self_threads!($out, $x);  # here closer to a factor of 2
  105.831 ms (21 allocations: 5.12 KiB)

It’s very odd that these get slower for me with @inbounds, for example:

julia> @btime mydistmat_self_symm!($out, $x);
  124.784 ms (0 allocations: 0 bytes)
3 Likes