Utilizing @turbo / @tturbo in Performance Critical Code

I’d try something like

@turbo for j = 1:orows
    distj = zero(eltype(dist))
    for k = axes(r1,1)
        distj += (r2[k,i] - r1[k,j])^2
    end
    dist[j] = 1/distj
end

FWIW, it’d probably be faster if r1 and r2’s memory layouts were transposed.

2 Likes