A safe inbounds use with great performance effect

As a followup from this:

Matt has shown that enumerate(eachindex(a)) is quite fast:

julia> function f!(A)
           for (count, idx) in enumerate(eachindex(A))
               A[idx] = count*2
           end
       end

julia> function g!(A)
           i = 0
           for idx in eachindex(A)
               i += 1
               A[idx] = i*2
           end
       end

julia> a = rand(1:10,10^5);

julia> @btime f!($a)
  31.433 μs (0 allocations: 0 bytes)

julia> @btime g!($a)
  75.225 μs (0 allocations: 0 bytes)

But all the difference disappears by using @inbounds, which has quite an important performance effect for both functions:

julia> function f2!(A)
           for (count, idx) in enumerate(eachindex(A))
               @inbounds A[idx] = count*2
           end
       end
f2! (generic function with 1 method)

julia> @btime f2!($a)
  18.272 μs (0 allocations: 0 bytes)

julia> function g2!(A)
           i = 0
           for idx in eachindex(A)
               i += 1
               @inbounds A[idx] = i*2
           end
       end
g2! (generic function with 1 method)

julia> @btime g2!($a)
  18.063 μs (0 allocations: 0 bytes)

This would be one case where disabling bounds checking seems perfectly safe. (The llvm codes do not appear identical, though).

In this case the inbounds could be deduced by the compiler in both cases. Shouldn’t it?

7 Likes

It should! And it does on the v1.8-beta. You can see this because removing the bounds checks enables SIMD — and that’s why there’s such a big 2x (or more) speedup.

julia> @code_llvm debuginfo=:none f!(a)
# ... skipping ...
vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %vec.ind = phi <4 x i64> [ <i64 1, i64 2, i64 3, i64 4>, %vector.ph ], [ %vec.ind.next, %vector.body ]
# ... skipping ...

That vector.body and those <4 x i64>s mean SIMD.

10 Likes