Help to understand the performance gain when using Base.throw_boundscheck

Hi! I want to understand a strange behavior for me. We are building a lower triangular storage for gravity models. If I use the following getindex:

function Base.getindex(
    L::LowerTriangularStorage{T},
    i::Int,
    j::Int
) where T<:AbstractIcgemCoefficient
    @inline
    @boundscheck (i > L.n || j > L.n || i < 1 || j < 1) && throw(BoundsError(L, (i, j)))

    # For the upper triangular part, return zero.
    j > i && return zero(T)

    return L.data[_ij_to_lt_index(i, j)]
end

I hit a major performance hit compared to a usual matrix. In this case, it takes 30% more time to fetch the element (the values are very low, 2.5 ns vs 3.3 ns, but we need to fetch a lot of coefficients). However, if I change to:

function Base.getindex(
    L::LowerTriangularStorage{T},
    i::Int,
    j::Int
) where T<:AbstractIcgemCoefficient
    @inline
    @boundscheck (i > L.n || j > L.n || i < 1 || j < 1) && Base.throw_boundserror(L, (i, j))

    # For the upper triangular part, return zero.
    j > i && return zero(T)

    return L.data[_ij_to_lt_index(i, j)]
end

Everything is fine! I think it might be something related to throw_boundserror marked as @noinline but I have no ideia. Can anyone explain to me please?

1 Like

Maybe instruction cache misses? How does @boundscheck (i > L.n || j > L.n || i < 1 || j < 1) && @noinline throw(BoundsError(L, (i, j))) do relative to these 2?

This version also has a bad performance (equal to the version without the @noinline).

1 Like

However, creating and calling a lambda function leads to the good performance:

    @boundscheck (i > L.n || j > L.n || i < 1 || j < 1) &&
        (() -> (@noinline; throw(BoundsError(L, (i, j)))))()
1 Like

This seems to be relevant;

3 Likes

Yeah, this is a known effect that crops up all the time. In any sort of small, performance sensitive function, it’s often greatly advantageous to hide put errors behind @noinline function boundaries.

2 Likes

I see, but what is the explanation? I have no idea why this happens.

Because throw has a ton of complicated crap in it that gets put into the function body. When you lock it behind a @noinline function, the optimizer can ignore it, and the function body itself doesn’t take as much room in the instruction cache (because the function contains less instructions).

4 Likes

Awesome! Thanks @Mason !

Why does it take a @noinline line inside a separate error-throwing method’s body to have this effect though? That is, a @noinline taking the throw call directly didn’t work here. Are there different inlining heuristics there?

In this scenario, I think it has more to do with BoundsError than throw, since BoundsError has an ::Any field, so constructing one requires allocations.

This means that there’s a bunch of allocation code (which never gets called) but gets inlined into the function body and that’s what’s causing problems as far as I understand.

2 Likes

Explains why Base.throw_boundserror exists as a separate method from its callee throw: a function barrier for the BoundsError instantiation too.

1 Like

Is there a way we can make this happen more automatically? Would adding @noinline to the constructors for some of the subtypes(Exception) (e.g., ones with non-concrete elements) get us most of the way there?

I.e., does @boundscheck (i > L.n || j > L.n || i < 1 || j < 1) && @noinline throw(@noinline BoundsError(L, (i, j))) (added a @noinline on BoundsError) do any better (with or without the @noinline on throw)?

If not, is there anything we could do towards this end with a @throw macro?

I would assume that it’s fine for exceptionally executed branches (try-catch as routine control flow is much rarer, error does @noinline for ErrorException), but there’s probably a debate on whether that’s generally good. I also just found out that throw is a builtin, so I’m not actually sure how inlining works for those.

I assumed from the docstring that the first @noinline would’ve applied to the BoundsError call as well, but evidently not based on the need for Base.throw_boundserror or a similar method. I’d think that either needs an implementation fix or a docstring clarification.

  @noinline block

  Give a hint to the compiler that it should not inline the calls within block.

  # The compiler will try to not inline `f`
  @noinline f(...)

  # The compiler will try to not inline `f`, `g` and `+`
  @noinline f(...) + g(...)

Note that this is separate from the zero-argument @noinline; that affects the surrounding method body.