Hi! I want to understand a strange behavior for me. We are building a lower triangular storage for gravity models. If I use the following getindex:
function Base.getindex(
L::LowerTriangularStorage{T},
i::Int,
j::Int
) where T<:AbstractIcgemCoefficient
@inline
@boundscheck (i > L.n || j > L.n || i < 1 || j < 1) && throw(BoundsError(L, (i, j)))
# For the upper triangular part, return zero.
j > i && return zero(T)
return L.data[_ij_to_lt_index(i, j)]
end
I hit a major performance hit compared to a usual matrix. In this case, it takes 30% more time to fetch the element (the values are very low, 2.5 ns vs 3.3 ns, but we need to fetch a lot of coefficients). However, if I change to:
function Base.getindex(
L::LowerTriangularStorage{T},
i::Int,
j::Int
) where T<:AbstractIcgemCoefficient
@inline
@boundscheck (i > L.n || j > L.n || i < 1 || j < 1) && Base.throw_boundserror(L, (i, j))
# For the upper triangular part, return zero.
j > i && return zero(T)
return L.data[_ij_to_lt_index(i, j)]
end
Everything is fine! I think it might be something related to throw_boundserror marked as @noinline but I have no ideia. Can anyone explain to me please?
Maybe instruction cache misses? How does @boundscheck (i > L.n || j > L.n || i < 1 || j < 1) && @noinline throw(BoundsError(L, (i, j))) do relative to these 2?
Yeah, this is a known effect that crops up all the time. In any sort of small, performance sensitive function, it’s often greatly advantageous to hide put errors behind @noinline function boundaries.
Because throw has a ton of complicated crap in it that gets put into the function body. When you lock it behind a @noinline function, the optimizer can ignore it, and the function body itself doesn’t take as much room in the instruction cache (because the function contains less instructions).
Why does it take a @noinline line inside a separate error-throwing method’s body to have this effect though? That is, a @noinline taking the throw call directly didn’t work here. Are there different inlining heuristics there?
In this scenario, I think it has more to do with BoundsError than throw, since BoundsError has an ::Any field, so constructing one requires allocations.
This means that there’s a bunch of allocation code (which never gets called) but gets inlined into the function body and that’s what’s causing problems as far as I understand.
Is there a way we can make this happen more automatically? Would adding @noinline to the constructors for some of the subtypes(Exception) (e.g., ones with non-concrete elements) get us most of the way there?
I.e., does @boundscheck (i > L.n || j > L.n || i < 1 || j < 1) && @noinline throw(@noinline BoundsError(L, (i, j))) (added a @noinline on BoundsError) do any better (with or without the @noinline on throw)?
If not, is there anything we could do towards this end with a @throw macro?
I would assume that it’s fine for exceptionally executed branches (try-catch as routine control flow is much rarer, error does @noinline for ErrorException), but there’s probably a debate on whether that’s generally good. I also just found out that throw is a builtin, so I’m not actually sure how inlining works for those.
I assumed from the docstring that the first @noinline would’ve applied to the BoundsError call as well, but evidently not based on the need for Base.throw_boundserror or a similar method. I’d think that either needs an implementation fix or a docstring clarification.
@noinline block
Give a hint to the compiler that it should not inline the calls within block.
# The compiler will try to not inline `f`
@noinline f(...)
# The compiler will try to not inline `f`, `g` and `+`
@noinline f(...) + g(...)
Note that this is separate from the zero-argument @noinline; that affects the surrounding method body.