# Non-allocating distance function

I have recently realised that `isapprox(a, b)` is for the most part equivalent to `norm(a - b) < tol`. In particular when `a` and `b` are vectors, this implementation allocates the vector of the difference.

``````julia> a, b = rand(10), rand(10)
julia> @benchmark isapprox(a, b)
BenchmarkTools.Trial: 10000 samples with 916 evaluations.
Range (min … max):  116.983 ns … 661.477 ns  ┊ GC (min … max): 0.00% … 78.26%
Time  (median):     118.547 ns               ┊ GC (median):    0.00%
Time  (mean ± σ):   122.136 ns ±  25.514 ns  ┊ GC (mean ± σ):  1.02% ±  4.08%
Memory estimate: 144 bytes, allocs estimate: 1.
``````

I was wondering whether it would be relevant to add a non-allocating distance function to `LinearAlgebra`, probably along the following lines.

``````function dist(a::AbstractArray{T1}, b::AbstractArray{T2}) where {T1<:Number, T2<:Number}
@assert size(a) == size(b) DimensionMismatch()
d = zero(promote_type(T1, T2))
@inbounds @simd for i in eachindex(a)
d += abs2(a[i] - b[i])
end
sqrt(d)
end

julia> @benchmark dist(a, b)
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
Range (min … max):  17.107 ns …  1.151 μs  ┊ GC (min … max): 0.00% … 97.25%
Time  (median):     17.825 ns              ┊ GC (median):    0.00%
Time  (mean ± σ):   19.304 ns ± 15.477 ns  ┊ GC (mean ± σ):  1.02% ±  1.37%
Memory estimate: 16 bytes, allocs estimate: 1.
``````

Non-allocating versions of `norm(a - b)` and `opnorm(A - B)` for exponents other than two could also be added in similar ways.

I’ve debated whether a “norm-of-difference” function would be worthwhile before. Here’s a quick benchmark of a handwritten method versus using generators:

``````function norm2diff1(x,y)
s::eltype(x) = 0
for i in eachindex(x,y)
@inbounds s += abs2(x[i]-y[i])
end
return sqrt(s)
end
function norm2diff1a(x,y)
s::eltype(x) = 0
@simd for i in eachindex(x,y)
@inbounds s += abs2(x[i]-y[i])
end
return sqrt(s)
end
norm2diff2(x,y) = sqrt(sum(abs2(xx-yy) for (xx,yy) in zip(x,y)))
norm2diff3(x,y) = norm(xx-yy for (xx,yy) in zip(x,y))

using BenchmarkTools

x = randn(100); y = randn(100);

@btime norm2diff1(\$x,\$y)
@btime norm2diff1a(\$x,\$y)
@btime norm2diff2(\$x,\$y)
@btime norm2diff3(\$x,\$y)
``````

yielding

``````julia> @btime norm2diff1(\$x,\$y)
81.828 ns (0 allocations: 0 bytes)
13.747489415960985

julia> @btime norm2diff1a(\$x,\$y)
18.054 ns (0 allocations: 0 bytes)
13.747489415960983

julia> @btime norm2diff2(\$x,\$y)
84.703 ns (0 allocations: 0 bytes)
13.747489415960985

julia> @btime norm2diff3(\$x,\$y)
267.925 ns (0 allocations: 0 bytes)
13.747489415960985
``````

So while `norm` over a generator is slower than the others, this is entirely due to the internal mechanisms of `norm` (preventing overflow, etc) as a generator-based naive implementation performs identically to a hand-written naive version.

The only standout here is the `@simd` version – although it looks like the current implementation of `norm` could support `@simd` as well. Perhaps that PR can be made, but a specialized version doesn’t appear necessary to achieve this benefit.

So the takeaway is that there doesn’t appear to be any real performance benefit to a specialized “norm-of-difference” function for element-wise inputs over generator-based alternatives. As for `opnorm`, this function already allocates so I don’t think there are any significant savings to be had for a difference version.

Why? When is `isapprox` performance critical? Or are you thinking of other applications (which?) in which this is a critical operation?

The only applications that I can think of involve lots of tiny arrays, e.g. pairwise distances between lots of 3-component vectors, but in such cases you are better off using `StaticArrays` anyway (in which `norm(a - b)` is already non-allocating).

Note that all of these functions differ from `LinearAlgebra.norm(x - y)` in that they can suffer spurious overflow for large arguments. e.g.

``````julia> x, y = [1e300], [-1e300];

julia> norm(x - y)
2.0e300

julia> sqrt(sum(abs2(xx-yy) for (xx,yy) in zip(x,y)))
Inf
``````

In fact, not all of my examples suffer from square-to-overflow as I explicitly included a `norm`-based one and asserted that its comparative slowness was due to that exact consideration.

But I think you missed my point. The point of the other benchmarks was to show that a generator producing the elements of `x-y` was no slower (modulo the apples-to-oranges use of `@simd`) than a handwritten function doing the same thing with two input arrays.

1 Like

I was indeed thinking of applications to clustering or removing duplicates, but not necessarily on small vectors.

I realise that I directed my question towards allocations only, whereas I also (mainly) wondering why `norm` does not use `@simd`. I understand that the current implementation is general enough to handle iterators that may not be indexable, but wouldn’t it be relevant to use `@simd` whenever possible?

PR 43256 was going to re-write norm to be careful about overflow only when necessary. Used in the style of `norm2diff3` this is 5x faster. Not sure about simd on arrays, perhaps it could be further improved.

1 Like