Help to understand the performance gain when using Base.throw_boundscheck

Ronis_BR · October 1, 2025, 10:20am

Hi! I want to understand a strange behavior for me. We are building a lower triangular storage for gravity models. If I use the following getindex:

function Base.getindex(
    L::LowerTriangularStorage{T},
    i::Int,
    j::Int
) where T<:AbstractIcgemCoefficient
    @inline
    @boundscheck (i > L.n || j > L.n || i < 1 || j < 1) && throw(BoundsError(L, (i, j)))

    # For the upper triangular part, return zero.
    j > i && return zero(T)

    return L.data[_ij_to_lt_index(i, j)]
end

I hit a major performance hit compared to a usual matrix. In this case, it takes 30% more time to fetch the element (the values are very low, 2.5 ns vs 3.3 ns, but we need to fetch a lot of coefficients). However, if I change to:

function Base.getindex(
    L::LowerTriangularStorage{T},
    i::Int,
    j::Int
) where T<:AbstractIcgemCoefficient
    @inline
    @boundscheck (i > L.n || j > L.n || i < 1 || j < 1) && Base.throw_boundserror(L, (i, j))

    # For the upper triangular part, return zero.
    j > i && return zero(T)

    return L.data[_ij_to_lt_index(i, j)]
end

Everything is fine! I think it might be something related to throw_boundserror marked as @noinline but I have no ideia. Can anyone explain to me please?

Benny · October 1, 2025, 10:27am

Maybe instruction cache misses? How does @boundscheck (i > L.n || j > L.n || i < 1 || j < 1) && @noinline throw(BoundsError(L, (i, j))) do relative to these 2?

Ronis_BR · October 1, 2025, 10:49am

This version also has a bad performance (equal to the version without the @noinline).

Ronis_BR · October 1, 2025, 10:52am

However, creating and calling a lambda function leads to the good performance:

    @boundscheck (i > L.n || j > L.n || i < 1 || j < 1) &&
        (() -> (@noinline; throw(BoundsError(L, (i, j)))))()

matthias314 · October 1, 2025, 12:18pm

This seems to be relevant;

github.com/JuliaLang/julia

mere presence of `throw` in function body makes function much slower

opened 06:28PM - 21 Jun 21 UTC

closed 09:37PM - 28 Jun 21 UTC

matthias314

I have noticed that the run-time of a function may significantly depend on wheth…er the function body contains `throw`, even if no exception is thrown. The performance tips in the manual don't mention this. I therefore wonder whether it is a bug. The following MWE is adapted from the method `typed_hvcat(::Type{T}, rows::Tuple{Vararg{Int}}, xs::Number...) where T` in `abstractarray.jl`. ``` @noinline throw_arg_err(msg) = throw(ArgumentError(msg)) function hv_cat(rows::Tuple{Vararg{Int}}, xs::Int...) nr = length(rows) nc = rows[1] for i = 2:nr if nc != rows[i] throw(ArgumentError("mismatch")) # SLOW # throw_arg_err("mismatch") # FAST end end # Base.hvcat_fill(Matrix{Int}(undef, nr, nc), xs) # v1.6.1 Base.hvcat_fill!(Matrix{Int}(undef, nr, nc), xs) # master end ``` The SLOW variant gives ``` julia> @btime hv_cat((1,1,1), 1,2,3); 107.800 ns (2 allocations: 144 bytes) ``` The FAST variant is 3x faster (and makes fewer allocations): ``` julia> @btime hv_cat((1,1,1), 1,2,3); 34.530 ns (1 allocation: 112 bytes) ``` Note that the code differs only in a branch that is not executed. I've tried to replace the call to `Base.hvcat_fill!` at the end with some other code, but then the difference disappeared. It also disappears if one changes the arguments to `hv_cat((1,1), 1,2)`. ``` Julia Version 1.8.0-DEV.61 Commit 7553ca13cc (2021-06-21 17:18 UTC) Platform Info: OS: Linux (x86_64-linux-gnu) CPU: Intel(R) Core(TM) i3-10110U CPU @ 2.10GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-12.0.0 (ORCJIT, skylake) ``` The example also works for Julia 1.6.1 with the slight change indicated in the code.

Mason · October 1, 2025, 12:26pm

Yeah, this is a known effect that crops up all the time. In any sort of small, performance sensitive function, it’s often greatly advantageous to hide put errors behind @noinline function boundaries.

Ronis_BR · October 1, 2025, 12:43pm

I see, but what is the explanation? I have no idea why this happens.

Mason · October 1, 2025, 12:55pm

Because throw has a ton of complicated crap in it that gets put into the function body. When you lock it behind a @noinline function, the optimizer can ignore it, and the function body itself doesn’t take as much room in the instruction cache (because the function contains less instructions).

Ronis_BR · October 1, 2025, 1:08pm

Awesome! Thanks @Mason !

Benny · October 1, 2025, 7:44pm

Why does it take a @noinline line inside a separate error-throwing method’s body to have this effect though? That is, a @noinline taking the throw call directly didn’t work here. Are there different inlining heuristics there?

Mason · October 1, 2025, 8:34pm

In this scenario, I think it has more to do with BoundsError than throw, since BoundsError has an ::Any field, so constructing one requires allocations.

This means that there’s a bunch of allocation code (which never gets called) but gets inlined into the function body and that’s what’s causing problems as far as I understand.

Benny · October 1, 2025, 8:52pm

Explains why Base.throw_boundserror exists as a separate method from its callee throw: a function barrier for the BoundsError instantiation too.

mikmoore · October 1, 2025, 9:14pm

Is there a way we can make this happen more automatically? Would adding @noinline to the constructors for some of the subtypes(Exception) (e.g., ones with non-concrete elements) get us most of the way there?

I.e., does @boundscheck (i > L.n || j > L.n || i < 1 || j < 1) && @noinline throw(@noinline BoundsError(L, (i, j))) (added a @noinline on BoundsError) do any better (with or without the @noinline on throw)?

If not, is there anything we could do towards this end with a @throw macro?

Benny · October 1, 2025, 10:08pm

I would assume that it’s fine for exceptionally executed branches (try-catch as routine control flow is much rarer, error does @noinline for ErrorException), but there’s probably a debate on whether that’s generally good. I also just found out that throw is a builtin, so I’m not actually sure how inlining works for those.

I assumed from the docstring that the first @noinline would’ve applied to the BoundsError call as well, but evidently not based on the need for Base.throw_boundserror or a similar method. I’d think that either needs an implementation fix or a docstring clarification.

  @noinline block

  Give a hint to the compiler that it should not inline the calls within block.

  # The compiler will try to not inline `f`
  @noinline f(...)

  # The compiler will try to not inline `f`, `g` and `+`
  @noinline f(...) + g(...)

Note that this is separate from the zero-argument @noinline; that affects the surrounding method body.

Topic		Replies	Views
Bounds check outside loop affects loop performance Performance loops	0	149	January 31, 2024
Bounds checking removal General Usage	2	512	October 24, 2017
Why does `@inbounds` disable constant propagation? Internals & Design constant-propagation , inbounds	25	1721	January 20, 2025
@inbounds slower GPU inbounds	8	457	March 25, 2025
When does @inbounds increase performance? Performance inbounds	14	1370	February 14, 2025

Help to understand the performance gain when using Base.throw_boundscheck

Related topics