Pairwise distance using matrix or Vector{SVector}: The first is faster!

Datseris · November 8, 2018, 4:34pm

I am facing a really unexpected result, that is that the pairwise distance computation using a matrix is actually faster than a vector of static vectors, for small dimensions. here is my simple setup:

using Distances, StaticArrays
D = 10
data  = [rand(SVector{D}) for i in 1:1000]
metrix = Chebyshev()

datam = zeros(D, 1000)
for i in 1:1000
    datam[:, i] .= data[i]
end

let’s now use the pairwise implementation of Distances. For reference, the generic method has the amazingly small source code:

function pairwise!(r::AbstractMatrix, metric::PreMetric, a::AbstractMatrix, b::AbstractMatrix)
    na = size(a, 2)
    nb = size(b, 2)
    size(r) == (na, nb) || throw(DimensionMismatch("Incorrect size of r."))
    @inbounds for j = 1:size(b, 2)
        bj = view(b, :, j)
        for i = 1:size(a, 2)
            r[i, j] = evaluate(metric, view(a, :, i), bj)
        end
    end
    r
end

and compare it with the just as simple:

const VV = Vector{<:SVector}

function pairwiseDS(x::VV, y::VV, metric::Metric)
    d = zeros(eltype(eltype(x)), length(x), length(y))
    for i in 1:length(x)
        for j in 1:length(y)
            @inbounds d[i,j] = evaluate(metric, x[i], y[j])
        end
    end
    return d
end

Benchmarks:

julia> @btime pairwiseDS($data, $data, $metric)
  47.305 ms (2 allocations: 7.63 MiB

# method from Distances:
julia> @btime pairwise($metric, $datam, $datam)
  36.199 ms (2 allocations: 7.63 MiB)

Even though the actual distance evaluation is faster:

v = rand(D)
sv = SVector{D}(v)

@btime evaluate($metric, $v, $v)
  30.293 ns (0 allocations: 0 bytes)
@btime evaluate($metric, $sv, $sv)
  25.173 ns (0 allocations: 0 bytes)

Was it just an unreasonable expectation from my side that the method with Vector{SVector} should be faster? But I mean, the actual distance evaluations are faster. Why isn’t the entire result faster?

StefanKarpinski · November 8, 2018, 4:53pm

I know that Distances uses BLAS operations for many of its pairwise distance computations, which is one of the reasons it is so much faster than the naive code you might write to do this. Is it possible that the package doesn’t recognize that a vector of SVectors is memory-compatible with BLAS and is therefore using a slower fallback implementation instead of the clever BLAS-based one?

kristoffer.carlsson · November 8, 2018, 5:02pm

Chebyshev is not amenable to the BLAS optimization but this is likely just

https://docs.julialang.org/en/v1/manual/performance-tips/#Access-arrays-in-memory-order,-along-columns-1

Datseris · November 8, 2018, 5:29pm

@kristoffer.carlsson I thought the same, and I was testing it. Indeed putting the j loop first gives

  37.050 ms (2 allocations: 7.63 MiB)

versus the Distances timing:

  35.395 ms (2 allocations: 7.63 MiB)

The later is still faster, I guess the reasons may not be even identifyable?

StefanKarpinski · November 8, 2018, 5:55pm

All reasons are identifiable if you’re willing to work hard enough to figure out what’s going on. The question is if it’s worth it to someone or not

Topic		Replies	Views
Compute pairwise distances (Distances.jl) General Usage performance , distances , tullio , loopvectorization	9	3702	February 1, 2024
Faster squared euclidean distance calculation Performance	11	1805	October 2, 2021
Understanding why one function is faster than the other New to Julia	16	1201	November 1, 2020
Can this be made faster? Performance dataframes	5	550	March 19, 2022
Pairwise computation slower than Python (Cython) code (BallTree very slow!) Performance question , tree	27	2061	June 18, 2021

Pairwise distance using matrix or Vector{SVector}: The first is faster!

Related topics