Understanding multi-dimensional array indexing performance

Just in order to demonstrate that the performance issue is simd-related:

function mysum(r)
    s = zero(eltype(r))
    for i in eachindex(r)
        s += r[i]
    end
    return s
end

function mysum_simd(r)
    s = zero(eltype(r))
    @simd for i in eachindex(r)
        s += r[i]
    end
    return s
end

Benchmark:

julia> r = rand(100,100);

julia> @btime sum($r)
  854.386 ns (0 allocations: 0 bytes)
5025.939581928298

julia> @btime mysum($r)
  8.900 μs (0 allocations: 0 bytes)
5025.939581928294

julia> @btime mysum_simd($r)
  568.108 ns (0 allocations: 0 bytes)
5025.939581928298

Note that the result is also more accurate for the simd-sum, in this case at least (after some more tests, I see this varies a bit, but the simd version is mostly more accurate):

julia> sum(big, r) - mysum(r)
4.055866753560621873475611209869384765625e-12

julia> sum(big, r) - mysum_simd(r)
-4.91606755304019316099584102630615234375e-13
2 Likes