How to propagate @inbounds properly

I have a bunch of more or less complex functions that have arrays and indices as arguments. I’d like to specify @inbounds once at the top-level function and propagate this down through all nested and standalone functions.

@propagate_inbounds seems to work fine for standalone functions. When I use it for expressions calling lambdas and nested functions, this seems to break however.

I assume I am missing some @propagate_inbounds application somewhere, which completes the chain.

What do I need to do, to make remove the bounds checks.

Example:

julia> function foo(a)
           aa = axes(a, 1)

           @propagate_inbounds function f(i)
               @propagate_inbounds p(j) = log(exp(a[i+1]) + exp(a[j+1]))
               sum(p(j) for j in aa)
           end

       @inbounds sum(i -> a[i] * f(i), aa)
       end
foo (generic function with 1 method)

julia> a = 1:5
1:5

julia> foo(a)
ERROR: BoundsError: attempt to access 6-element Array{Int64,1} at index [7]
Stacktrace:
 [1] getindex at ./array.jl:809 [inlined]
 [2] p at ./REPL[56]:6 [inlined]
 [3] MappingRF at ./reduce.jl:93 [inlined]
 [4] _foldl_impl(::Base.MappingRF{var"#p#13"{Int64,Array{Int64,1}},Base.BottomRF{typeof(Base.add_sum)}}, ::Base._InitialValue, ::Base.OneTo{Int64}) at ./redu
ce.jl:62
 [5] foldl_impl at ./reduce.jl:48 [inlined]
 [6] mapfoldl_impl at ./reduce.jl:44 [inlined]
 [7] #mapfoldl#204 at ./reduce.jl:160 [inlined]
 [8] mapfoldl at ./reduce.jl:160 [inlined]
 [9] #mapreduce#208 at ./reduce.jl:287 [inlined]
 [10] mapreduce at ./reduce.jl:287 [inlined]
 [11] sum at ./reduce.jl:494 [inlined]
 [12] sum at ./reduce.jl:511 [inlined]
 [13] f at ./REPL[56]:7 [inlined]
 [14] #11 at ./REPL[56]:10 [inlined]
 [15] _mapreduce(::var"#11#14"{Array{Int64,1},var"#f#12"{Array{Int64,1},Base.OneTo{Int64}}}, ::typeof(Base.add_sum), ::IndexLinear, ::Base.OneTo{Int64}) at .
/reduce.jl:408
 [16] _mapreduce_dim at ./reducedim.jl:318 [inlined]
 [17] #mapreduce#620 at ./reducedim.jl:310 [inlined]
 [18] mapreduce at ./reducedim.jl:310 [inlined]
 [19] _sum at ./reducedim.jl:727 [inlined]
 [20] #sum#628 at ./reducedim.jl:723 [inlined]
 [21] sum at ./reducedim.jl:723 [inlined]
 [22] foo(::Array{Int64,1}) at ./REPL[56]:10
 [23] top-level scope at REPL[58]:1
 [24] run_repl(::REPL.AbstractREPL, ::Any) at /tmp/portage/dev-lang/julia-1.5.3/work/julia-1.5.3/usr/share/julia/stdlib/v1.5/REPL/src/REPL.jl:288

N.B.: Yes, I’ve actually profiled and getindex takes up quite a portion of time.

Hm, strange. I’d guess this might be the closure capture type instability bug and the failure to @inbounds and profiling results are just symptoms of that larger problem. Removing bounds checking (in general) requires inlining which requires type stability.

Oop, that’s wrong. It’s that Generators and sum itself are adding that function boundary that you’re missing. That is:

julia> function f(i, a, aa)
           @propagate_inbounds p(j) = log(exp(a[i+1]) + exp(a[j+1]))
           @inbounds sum(p(j) for j in aa)
       end
f (generic function with 1 method)

julia> f(5, 1:5, axes(1:5,1))
ERROR: BoundsError: attempt to access 5-element UnitRange{Int64} at index [6]

If you move that @inbounds inside the generator, then you’re ok. I’m not sure if we have a way to tag a Generator as propagating inbounds, particularly if it’s getting called from within a higher-order function.

julia> function f2(i, a, aa)
           @propagate_inbounds p(j) = log(exp(a[i+1]) + exp(a[j+1]))
           sum(@inbounds p(j) for j in aa)
       end
f2 (generic function with 1 method)

julia> f2(5, 1:5, axes(1:5,1))
31.200074158612694
4 Likes

I see, thanks.
But this basically means, that I have to specify @inbounds in downstream functions, where I don’t necessarily know, if everything will actually be in bounds. By explicitely marking the sum’s Generator inside f(i) as @inbounds, I lose bounds checking for uses of f(i) where I can’t guarantee that everything will be in bounds.

Is there a way to do some sort of explicit @inbounds propagation?

Propagation also seems to break with broadcasting. So I will need manual propagation for lambdas, Generators and inside broadcasted functions?

Example:

using LinearAlgebra
function foo(a)
    aa = axes(a, 1)

    @propagate_inbounds function f(i, x)
        @propagate_inbounds p(j) = log(exp(a[i+1]) + exp(a[j+1]))
        log(a[i+1] * sum(@inbounds p(j+1) for j in aa) - x)
    end

    sum(i -> (@inbounds a[i+1] * dot([1,2,3,4,5], f.(i+1, [1,2,3,4,5]))), aa)

    # Same result with this instead of sum:
    #ret = zero(eltype(a))
    #@inbounds for i in a
    #    ret += a[i+1] * dot([1,2,3,4,5], f.(i+1, [1,2,3,4,5]))
    #end
end

still throws a BoundsError because of the a[i+1] factor in front of the sum.
When I replace the line with log(@inbounds a[i+1] sum... it seems to work.

Yeah, in general, we don’t want higher order functions to @propagate_inbounds because we don’t know if it’s ok and intended to elide bounds checks within the passed function and because we don’t really want functions like sum to inline. While you can mark anonymous functions with @propagate_inbounds, too, I think you’re just going to end up tying yourself in knots and running into walls. You’re trying to get inbounds propagation to do something it was not really intended to do, so you’ll keep running into cases like this.

That is, @propagate_inbounds was built for indexing-infrastructure sorts of patterns where you have full control of the call stack and you’re really depending upon inlining in any case. Higher order functions, broadcasting, generators, etc., are often best served if they don’t inline.

3 Likes

Interesting. I’m not yet entirely sure, how bounds checking relates to inlining.

How is


function foo(a)
    ret = zero(eltype(a))
    for i in axes(a,1)
        ret += f(a, i)
    end
    ret
end

different from

function foo(a)
    sum(f(a, i) for i in axes(a, 1))
end

for some

@propagate_inbounds f(a, i) = ...

?

It’s just the implementation, driven by the intended use-cases. Bounds-check removal requires changing the code in a function you’re calling. It’s a spooky action at a distance. Instead of building a full second function body, the compiler does the transform as part of the inlining step — it’s already plopping a copy of the function body right into the caller, so it can do a little transform in the process.

In all three cases, I’d argue that you want @inbounds f(a, i), because it’s at that level that you know i is valid for a given how f works. That’s where the for loop was defined.

3 Likes

Ok, makes sense. And I guess disabling the propagation for actual library code (like generators or broadcasts) and enabling the propagation for the code inside the lambda/generator/broadcast is too complex / not worth the effort.