Which block `@inbounds` actually avoid testing bounds?

For example, this simple function:

function f(vec1,vec2)
  for i in 2:length(vec1)
    vec1[i] = vec1[i] - vec2[i-1]
  end
  nothing
end

I get:

julia> @btime f($vec1,$vec2)
  759.284 ns (0 allocations: 0 bytes)

If I add @inbounds to the inner loop, I get:

julia> @btime f($vec1,$vec2)
  107.009 ns (0 allocations: 0 bytes)
Code
function f(vec1,vec2)
  @inbounds for i in 2:length(vec1)
    vec1[i] = vec1[i] - vec2[i-1]
  end
  nothing
end

But if I add @inbounds to the function declaration as a whole, I get again the slow performance:

julia> @btime f($vec1,$vec2)
  758.388 ns (0 allocations: 0 bytes)

Code
@inbounds function f(vec1,vec2)
  for i in 2:length(vec1)
    vec1[i] = vec1[i] - vec2[i-1]
  end
  nothing
end

It is not clear to me what is the actual block inbounds acts upon. I tried Base.@propagage_inbounds but the result is the same. I could not figure out from the manual how inbounds is, or not, propagating.

This also makes me wonder why the example of its use is something like:

      for i = 1:length(A)
          @inbounds r += A[i]
      end

and not just adding it to the loop as a whole. I have seen this pattern often, but never figured out why one would do that instead of flagging the whole loop.

We clearly need to document this better, as it’s snagged some of the best of us.

@inbounds is effectively a runtime macro, not a recursive syntax transform. It turns of bounds checking, then runs the wrapped code, then turns it back on again. When you do @inbounds function f() ... end, you’re removing bounds checking during the definition of the function. Within the function, though, it doesn’t matter if you write:

@inbounds for ...
end
# or
for ...
    @inbounds r += A[i]
end
# or
@inbounds begin
    for ...
    end
end

I like scoping it as narrowly as possible if it’s convenient to do so since I might add more things into a for loop or an arbitrary block of code and forget that @inbounds is applied.

For me, it’s aesthetic reasons. I never liked that macros mess up the alignment of blocks with end. Putting @inbounds inside the loop, when there’s just a single applicable expression for it, just looks cleaner.

Macros look much like Python decorators, but decorators can be put above the block they apply to.

Sometimes I wish macros would consume the next non-empty expression, or something (maybe discard leading whitespace?), then I could put @inbounds, @threads, etc. above the block. It would look nice and also be easy to comment out.