Removing bounds checking for HPC - `--check-bounds=unsafe`?

An imo much more appropriate approach would be to use the interpreter at compile time:

  1. If the computation of g(1) takes very long time, like approximately forever, then we want to be able to abort (say, recursive non-memoized fibunacci of 100), that is a side-effect.
  2. Side effects! We can just run g(1) in the interpreter until we see a side effect, then abort and decide that this cannot be speculatively run.
  3. UB. UB is a side-effect. Suppose the computation of g(1) contains incorrect @inbounds or funny pointer arithmetic etc that corrupts the runtime or pops a shell. The user asked for a shell to be popped and we shall do so, but not speculatively!

What you’re describing is roughly how Julia worked prior to 1.9. The big downside of an approach like this is that it’s really slow (100-1000x slower than using the compiler). Julia in 1.9 introduced the “effect system” which tracks all the of the possible types of effects (constency, non-termination, UB, side effect free, and a couple extras). This makes running code at compile time (when safe to do so) much faster as well as allowing for some other optimizations (like dead code elimination) which you can’t perform simply via interpretation.

2 Likes

This does seem questionable – consider

julia> fib(n) = n < 0 ? 1 : fib(n-1) + fib(n-2)
julia> fub(x) = x ? fib(5) : 0

We want that speculatively executed!

julia> fib(n) = n < 0 ? 1 : fib(n-1) + fib(n-2)
julia> fub(x) = x ? fib(1000) : 0

No way do we want to execute that speculatively, that wouldn’t terminate until heat death!

For speculative execution, we really want to specialize the effects on constants. We don’t care about the effects of fib(n::Int) (which arguably correctly infers as maybe-not-terminating), we only care about the effects of fib(5) (and we only care about that until we spot the first side-effect / inconsistency!).


That is a very valid point: It can be fine to have code containing bounds-check violations that are valid, intended, caught and handled, a la

julia> function foo(a)
       try
       return a[100]
       catch
       0
       end
       end
foo (generic function with 1 method)

julia> foo([])
0

Similar as global --fast-math had to be removed.

If I understand the digression on speculative execution right, then the issue is that Core.Compiler is now such code, due to internal implementation details, that gets miscompiled under a naive --check-bounds=no regime.

So the change is that: Before, only a subset/dialect of julia ran correctly under --check-bounds=no (albeit with changed semantics), and this subset has now become empty?

For that, I would consider a system that allows specific modules to declare that their bounds-checks must not be removed even if --check-bounds=no. In order to make the subset/dialect of julia that is --check-bounds=no-compatible nonempty, it is sufficient to hard-code Core.Compiler? (but I guess a compile-time option in some header file would be more convenient, or even a @insist_on_boundschecks module ... end macro)

As long as --check-bounds=yes remains part of the language, the obvious consequence of removing --check-bounds=no is that everybody is incentivized to add @inbounds everywhere, and document that their package should be run under --checkbounds=yes for testing.

This is stupid!

2 Likes

As long as --check-bounds=yes remains part of the language, the obvious consequence of removing --check-bounds=no is that everybody is incentivized to add @inbounds everywhere, and document that their package should be run under --checkbounds=yes for testing.

Yes, precisely. And I think that’s great - the best possible situation, honestly:

  • @inbounds is only used where the author has explicitly, and locally, validated it to be safe. This greatly reduces the risk of segfaults, miscompilations and other hard-to-diagnose errors. It has to be put all over the place, because it needs to be locally checked by the author at each of these places.
  • When testing, these hard-to-diagnose errors are easily caught using --checkbounds=yes. Note that Pkg.test already runs with this option set, for precisely this reason.

I don’t think the main argument is that code may rely on intentionally triggering a boundserror (thus causing miscompilation if it’s removed). Rather, it’s that people unavoidably make mistakes, and it’s much easier to check one local indexing operation is valid rather than validating entire modules.

It’s analogous to how you could technically wrap an entire Rust source file in unsafe {}, but that would be a terrible idea and completely defeat the purpose of unsafe-blocks, which is to locally validate each instance of potentially unsafe behaviour.

Regarding the speculative execution, I seriously doubt it’s possible to determine at compile that fib(5) will terminate quickly without trying to run it. Certainly it will be impossible to determine for arbitrary code.

3 Likes

Slightly tangential to the discussion, but I wanted to briefly mention Base.@propagate_inbounds in this context.

Combined with @inbounds I found this a very convenient way of declaring exactly what I can prove to be in-bounds without covering too much (as might be the case in a fully recursive @inbounds version), nor too little (just putting @inbounds around the top-level call). I agree with @jakobnissen that forcing code authors to chose a (slightly) more verbose path to improve correctness outweighs the convenience gain in some special use cases.

An example where I’ve used this quite happily are custom structs which support indexing or some kind of access to internal indexable objects. I cannot know if someone calls my own getindex method with a correct index, but if I generated the indices myself in another function and use them in the same place, I do know that they are correct (granted, the code becomes a bit verbose, but you can also write aliases to shorten the macros).

Something like this:

# Just shorten the name a bit
using Base: @propagate_inbounds
const var"@pi" = var"@propagate_inbounds"

# Can't use @inbounds here directly
# and don't want to put it around every single instance of `container[i]`
# Instead, `@propagate_inbounds` can cover more ground in the end
@pi getindex(container::MyContainer, i) = container.data[i]

# Slightly cumbersome: callers of my `getindex` method need all be annotated as well
# (if I understand this macro correctly)
@pi modify_containers(containers, indices) = ...

function do_the_thing()
    # I create the containers and know that the accesses will be in-bounds
    # This function is really the only place I can and want to put `@inbounds`
    containers = ...
    indices = ...
    @inbounds modify_containers!(containers, indices)
    ...
end

(I’m not so sure though how smart this approach is when it comes to very large functions and inlining?)

2 Likes

Not quite. The current “standard” is:

  1. People add @inbounds if it’s locally validated to be safe
  2. When running with default arguments checkounds=auto, you get a nice mix: Some performance is left on the table, due to bounds-checks. Some safety is left on the table as well – some @inbounds are erroneous.
  3. You can always run under --checkbounds=yes if you care about safety a lot, or if you want to validate/test, similar to various sanitizer options.
  4. You can always run under --checkbounds=no if your entire program is such that validation with --checkbounds=yes is expected to give good coverage.

This is very nice! If checkbounds=no is removed from the language, we will end up with:

  1. Validation/testing is done with --checkbounds=yes, as current
  2. The “full performance” mode is only possible if you add @inbounds everywhere, irregardless of whether that is “locally valid” or not.
  3. Hence, people do that! As they absolutely should.
  4. There are now only 2 modes of operation. The people who used to be happy with --checkbounds=auto are fucked.

PS. to give an example of questionable @inbounds in Base:

julia> a=zeros(UInt,3); idxs=[1,2,3]; v=view(a, idxs); idxs[1]=0; @show pointer(a); collect(v)
pointer(a) = Ptr{UInt64}(0x0000746a96ac3860)
3-element Vector{UInt64}:
 0x0000746a96ac3860
 0x0000000000000000
 0x0000000000000000

I think that @inbounds is nice and makes sense, even though its correctness is arguable. But that is under the social contract that --check-bounds=auto is allowed to segfault and is a compromise between check-bounds=yes and check-bounds=no. If you remove check-bounds=no, then all code will look like that!

4 Likes

Is there an argument here for array-access bounds checks being an important special case?

If you’re accessing an Array (or any AbstractArray that’s not a StaticArray), it’s heap-allocated at runtime - I assume that means there’s no hope of doing bounds checks at compile time (maybe it’s possible in principle in some functions, but how to you prove that there was no intermediate function with a side-effect that resized the array, etc.). If that’s true then all the arguments about optimization, etc. in the compiler do not apply to array-accesses.

For myself (and I think I’m representing a reasonably large class of users) the bounds checks on array accesses are the only ones I really care about, because they’re the ones that happen inside a loop over grid points. In a code doing time evolution on a grid, the grid doesn’t change, and if I’ve tested my indexing logic on several small grids, and a couple of timesteps on a large grid, either bounds checks caught my errors already, or they won’t catch the error in my production run anyway, so bounds checks after initial testing are just a waste of time, money, and CO2 emissions.

Proposal: have a special type of @boundscheck/@inbounds like @arrayboundscheck/@arrayinbounds that can be used by the Array interface (and other suitably similar types), and have --check-bounds=noarray that is intermediate between --check-bounds=no and --check-bounds=auto in that it only disables @arrayboundscheck checks. If that was a feature, at least for my use-case, I don’t care if --check-bounds=no was removed.

The InboundsArray.jl package idea goes some way towards this idea, but would end up needing a lot of code to pass through to Array-specialised implementations of many packages, so is very high-maintenance as a general-purpose solution.

1 Like

This is fine for some types of packages, but for an application that is a piece of scientific simulation software, it is a nightmare - this is why you get so much push-back from those of us who develop this kind of software. If you talk to people who are used to developing simulation codes in Fortran, they think it is insane that removing --check-bounds=no is even being discussed. It is a top-priority feature, and if it disappears and is not replaced with something that doesn’t require boilerplate code I think a lot of those people would go back from Julia to Fortran, or never start using Julia in the first place. I want to make that point strongly, because you won’t hear from those people, because they aren’t already invested in using Julia, you would just silently lose them. I think ‘replacing Fortran for scientific software’ is one of Julia’s reasons for existing - at least it’s essentially my reason for using it!

3 Likes

I’m not against of what you are willing to have here, at all.

But I believe the reason for not having that push back in Fortran is that (AFAIK) there is no way to add an explicit @inbounds checking for a subset of the code in Fortran. You can, at best, compile a subsection of the code without bound-checking, which would be the equivalent, but much more painful if done properly, to adding @inbounds “everywhere”.

My experience is that, in Fortran or Julia, I find out the errors in my indexing logic the worst possible way: by getting users to show me an example where it failed. And I’m glad they are running with bounds checking instead of getting simply wrong results. Because of that, I use @inbounds where I’m sure it is safe, and only there. I find hard to believe that that’s “everywhere” in any code, when restricted to the performance critical points.

Yet, I’m all for you and others having the choice, if possible.

(ps: I’m not willing to discuss here, just share my experience. For me, thinking that way, led me to a better software developing practice and better code safety and maintainability, and my Julia codes are currently much faster than what I used to write in Fortran)

That’s the thing - if I look at the code I’m currently developing, I’d need probably 100-ish @inbounds calls. The majority of ‘features’ of the code, are adding terms in the PDEs being solved, so adding terms inside the ‘hot loops’. If those get missed, you’d lose a fraction of a percent here and there - maybe you’d argue that’s not a big deal, but taken in total they add up to something significant and when each individual place is not a big cost, it’ll be next to impossible to find by profiling the code. So even if the ‘ideal’ solution is “put @inbounds everywhere it’s performance critical” I’d still want --check-bounds=no (or possibly a safer version) as a comparison to check that I’m actually getting the peak performance*.

* On Julia-1.11 at the moment the ability to do this is broken already because PackageCompiler.jl segfaults with --check-bounds=no (Segfault when creating sysimage with `--check-bounds=no` under Julia-1.11 · Issue #1021 · JuliaLang/PackageCompiler.jl · GitHub), and compiling a system image is required for large parallel HPC runs (otherwise we’d run out of RAM).

1 Like

This is tangential, but the reason my Julia codes are faster than my Fortran codes is that in Julia it is much easier to isolate and benchmark small sections of the code, such that we can tune for performance those critical sections much more precisely.

(I’m not continuing here to not derail the thread, which raises a fair point anyway).

1 Like

This is mixing up a few things. The heap allocation doesn’t have anything to do with boundscheck elimination, resizability does. Once it’s registered, FixedSizedArrays.jl will likely be faster for your purposes since a non-resizable collection is a lot easier to boundscheck at compile time.

3 Likes

For reasons, the indexes used for each loop in my code (which are a subset of the complete indices of the array, depending on which of the parallel process is executing the code) are stored in a struct that is populated during initialisation. So code for some ‘hot loop’ would look like

  1. look up index ranges for x, y, and z dimensions from some struct passed to the function
  2. nested loop over an array, for those index ranges
  3. in the body of the loop, do some stuff

If the compiler could bounds-check that pattern at compile time, I would be stunned. I also don’t think it’s a particularly uncommon type of pattern. A similar alternative would be that the indices to be accessed are calculated by some fairly complicated arithmetic inside the function.

Edit: or to put it another way, to the best of my understanding at compile time, the only information the compiler could conceivably access is 1) here’s an array of size (n1,n2,...) and 2) here are some index ranges (imin1:imax1,imin2:imax2...) - it will only get the values of n1,n2,..., imin1,..., imax1,... at run time, so it can’t prove anything useful.

Doesn’t this very topic demonstrate the challenges with --check-bounds=no? The two linked issues in the OP are that:

  • --check-bounds=no incurs a runtime performance penalty
  • --check-bounds=no leads to segfaults at compile time

Either the compiler pessimizes how it works with all these unsafe indexing expressions, or it might do something unsafe. I get the core request — why can’t it just work like it used to!? — but I also want a smarter compiler that can reason about code more effectively.

2 Likes

In cases like this, it’s probably a good idea to add @boundscheck checkbounds(A, imin1:imax1,imin2:imax2...). By adding this bound-check outside of the loop, you make it much easier for the compiler to remove the bounds-checks from the loops (since it should be able to prove that you’ve already checked those bounds), while not sacrificing safety in case someone made a typo or logic error such that you actually are indexing out of bounds.

The way I’ve understood it, what’s meant by “compile-time boundschecks” is usually a form of loop unswitching. What the compiler is doing is rewriting your code from

for i in imin:istep:imax
    # do stuff with a[i]
    # bounds checked at each access
end

to (ish)

@assert firstindex(a) <= imin <= lastindex(a)
@assert firstindex(a) <= imax <= lastindex(a)
for i in imin:istep:imax
    # do stuff with a[i]
    # no bounds checks because the whole
    # range was validated before the loop
end

So it’s not fully proving that all accesses are inbounds at compile time, but it can prove that validating the first and last access before the loop is sufficient, and rewrite the code accordingly.

1 Like

I don’t mean to second guess your conclusion (or maybe I do, but only in a friendly way) but have you verified with benchmarks that you actually need @inbounds for performance at all of these locations? I would not recommend tallying all the indexing expressions in your code and using that as the number of necessary @inbounds — I have often found in my own code that adding @inbounds makes no meaningful performance difference one way or the other

If I use --check-bounds=no, or have @inbounds on all array accesses, then the code is ~25% faster than --check-bounds=auto with no @inbounds. I haven’t checked each site individually.

@Oscar_Smith @danielwe thanks, I’m starting to see how you expect this to work. I’ll refrain from going down the rabbit hole of trying to make my indexing convoluted enough to break the compiler’s logic!

I would say though that even if it worked perfectly, being forced to write a @boundscheck checkbounds(A, imin1:imax1,imin2:imax2...) that’s good enough to allow the compiler to correctly infer the bounds in every loop in the code is even more work than just putting @inbounds everywhere, so pragmatically I’m not going to do it, and more importantly I don’t want to require every new PhD student, etc. to have to know how to do it before they can have a pull request accepted.

I recommend carefully reading @Sevi’s post about Base.@propagate_inbounds above: Removing bounds checking for HPC - `--check-bounds=unsafe`? - #25 by Sevi Perhaps you can reduce the number of @inbounds by punting them a level or two up the call stack? Ideally every @inbounds will be placed right at the point where it’s most obvious from the code that the indices are actually inbounds, and then Base.@propagate_inbounds will make sure it propagates down to the actual indexing expression.