Fast offset matrix subtraction

For example, the following function is 20x faster than your original code on my machine (thanks in a large part to the LoopVectorization.jl package), is arguably clearer (it’s a lot closer to your original formula), and performs no allocations:

using LoopVectorization
function m(h)
    nn, nt = size(h)
    s = zero(eltype(h))
    @turbo for i = 1:nn
        s += (h[i,1] - h[i,nt])^2
    end
    @turbo for t = 2:nt, i = 1:nn
        s += (h[i,t] - h[i,t-1])^2
    end
    return s / nn
end

Benchmark with:

using BenchmarkTools
@btime m($h);

(Even if you remove LoopVectorization’s @turbo annotation, it is still more than 2x faster than your original code, and still allocates no memory.)

4 Likes