Removing bounds checking for HPC - `--check-bounds=unsafe`?

One of Julia’s main purposes is to enable high-performance computing at scale - it’s on the front page (Ecosystem → Parallel Computing at The Julia Programming Language)! However, apparently --check-bounds=no is becoming ‘defunct’ (Performance regression up to 20% when updating from Julia v1.10.4 to v1.11.0-rc1 · Issue #55009 · JuliaLang/julia · GitHub, Segfault when creating sysimage with `--check-bounds=no` under Julia-1.11 · Issue #1021 · JuliaLang/PackageCompiler.jl · GitHub).

Sorry this post is a bit long. Where I have seen discussion of --check-bounds=no recently, I feel like people who use and want --check-bounds=no are being told that they’re just wrong and shouldn’t use it, so I want to defend this particular use case in detail.

Issues:

  • Many scientific HPC codes have a small number of developers (1-5), who are primarily domain-specialist scientists (often PhD students with a lot else to learn already), not programmers or computer scientists.
  • HPC codes can routinely use millions or hundreds of millions of CPU hours per run, with a significant cost in both money and carbon emissions.

--check-bounds=no was a major selling point, as it makes/made it easy to develop and test code in a safe way, but remove the cost of bounds-checking in large-scale runs. Even a 10% cost of bounds checking is significant, and it could be more like 50%.

A common use-case in this scientific-HPC domain is to time-evolve some system of PDEs. In that case we will be repeating millions or hundreds of millions of times identical operations, just with different data in the arrays, while all the array indexing is identical. It is insane to pay $100,000s to bounds check repeated, identical (apart from numerical values) operations. For this kind of operation, if we check the correctness of the indexing on a reasonably large number of small grids, we can be confident enough that it is correct when the only difference is that we use more grid points and/or more timesteps. An ideal solution might be to bounds-check only on the first few timesteps, then disable it, but that’s probably too much trouble to implement, and we can do that manually by just doing a short run with bounds-checking (admittedly probably only done after we notice a problem).

The suggested solution seems to be ‘use @inbounds where it is important’. Say 50% of my code is for setup of the problem, and 50% calculates the time-derivative. Then 50% of code is essentially ‘in the hot loop’, and this is the part most likely to be under active development. To expect everyone to carefully think about where to put @inbounds every time they write new code is not reasonable for the kind of project teams we have.

The argument against --check-bounds=no (and even against @inbounds), if I’ve understood it, is that const-propagation and similar optimizations that the compiler might be able to do are harder without bounds checking. If we accept that out of bounds access in undefined behaviour, which might cause subtly wrong results, segfaults (or even set fire to your computer!), why should there be a problem? The compiler has been told that it is allowed to assume that there is no out-of-bounds access, and optimize on that basis. This is standard practice in HPC. If that is too unsafe for many/most domains where Julia is used, fine, but could we not have something like --check-bounds=unsafe for HPC where the user genuinely takes responsibility for ensuring that there are no out-of-bounds array accesses, on pain of undefined behaviour? I’m not a compiler developer, so I have no idea how much work this is, but isn’t it just a short-cut in some places that the compiler does not have to try to prove inbounds-ness, which seems (to my uninformed mind) fairly simple?

7 Likes

The main problem of --check-bounds=no is that it’s a global setting which affects the entire session, including the Julia runtime and compiler: messing up with that can cause miscompilations and other unexpected errors that you really don’t want to deal with.

@inbounds is a much better approach because it’s local and you can use where you believe it’s safe to do so. Also, the thing is that the compiler can already remove bounds checks in many cases where it can prove it’s safe to do so, with the result that you don’t need to manually add @inbounds in the first place, it isn’t like you have to sprinkle @inbounds everywhere for the sake of it.

The main problem of --check-bounds=no is that it’s a global setting which affects the entire session, including the Julia runtime and compiler: messing up with that can cause miscompilations and other unexpected errors that you really don’t want to deal with.

Then how was it working at least up until julia-1.9.x (or julia-1.10.x for some use cases)? People (myself included) have been using this flag routinely! If this is the reason, give us the ability to @inbounds an entire package or module - but I remember seeing hostility to that when someone suggested it on another thread.

@inbounds is a much better approach because it’s local and you can use where you believe it’s safe to do so. Also, the thing is that the compiler can already remove bounds checks in many cases where it can prove it’s safe to do so, with the result that you don’t need to manually add @inbounds in the first place, it isn’t like you have to sprinkle @inbounds everywhere for the sake of it.

The performance improvements observed when using --bounds-check=no suggest that this doesn’t happen reliably enough for me to depend on it! So I would have to put @inbounds in dozens of functions at least.

For my code (and I imagine many scientific codes) we iterate through arrays in funky ways - e.g. my finite-element methods requires operations on overlapping sub-blocks like 1:5, 5:9, 9:14, etc. I would be amazed if any compiler would ever prove that these things are safe!

2 Likes

I would be amazed if any compiler would ever prove that these things are safe!

…and even if it could (sometimes), I’ve seen enough issus/discussions about performance regressions due to things like const-propagation going wrong, that such a complex operation is not something I would be comfortable relying on to maintain performance of my code. We want to just write simple loops, not constantly massage them to make the compiler happier (e.g. Performance regression up to 20% when updating from Julia v1.10.4 to v1.11.0-rc1 · Issue #55009 · JuliaLang/julia · GitHub for a recent example of this kind of thing).

@giordano also thank you for taking the time to read and reply! I don’t like to complain, but this issue has the potential to be enough of a developer-overhead cost that it might be better for me to just use C++ or Fortran (at least for my next project) rather than worrying about whether I can get the performance from Julia. That would be a pain because Julia is so much nicer in so many ways! So I’d rather keep pushing the point within Julia that this specific large-scale HPC scientific simulation community has a particular need here, which was well addressed in the past (up to julia-1.9.x), but seems be getting worse at the moment.

The main problem of --check-bounds=no is that it’s a global setting which affects the entire session, including the Julia runtime and compiler: messing up with that can cause miscompilations and other unexpected errors that you really don’t want to deal with.

I guess I assume that any bounds-checking violation will cause an (unhandled) error. Is that not the case? If it is true, then if a program compiles and runs with bounds-checking turned on, then an identical run without bounds checking couldn’t have any problems (at least if the compiler was really assuming that there are no out-of-bounds errors, so it does not refuse to optimize code in case there might be), could it?

No:

% julia --check-bounds=auto -e 'f(x) = try x[4]; catch; println("failed"); else println("all good: ", x[4]); end; f([1])'
failed
% julia --check-bounds=no -e 'f(x) = try x[4]; catch; println("failed"); else println("all good: ", x[4]); end; f([1])'
all good: 4540714448

And the compiler may be doing similar try/catch checks, if you force it to go through wrong paths the compiler may be as well giving you garbage code to run.

Again, the problem is that --check-bounds doesn’t just affect the code you write with as much care you may put in it, but also code in the compiler, or code in third-party packages which put much less care than what you may have had.

2 Likes

@giordano yes, I feel like people shouldn’t use exception handling like that, but I take the point. My issue is then you’re telling me that the only solution is to put @inbounds everywhere, and I want to raise the point that from a code-maintenance point of view for scientific developers, I think that’s a pretty awful solution. Can we please have a solution that’s as simple, and works as well, as --check-bounds=no does (or at least used to)?

The problem here is that the compiler wants to be able to run your functions at compile time with values that you might not have run them with. Consider the following code:

function f(x, y)
    if p(x)
        return g(y)
    else
        return 1
    end
end

h(x) = f(x, 1)

To make h faster, the compiler wants to evaluate g(1) at compile time, but this is illegal to do if g contains undefined behavior since the user might always call p(x) such that it is false. The thing that changed in 1.0 is we significantly expanded the ability of the compiler to do this type of optimization.

2 Likes

I’m hopeful that a package like FixedSizeArrays.jl can make the compiler’s life easier, as it can potentially see more bounds at compile-time (“potentially” doing a lot of work here).

2 Likes

@Oscar_Smith thanks, that example makes it clearer why this is hard in general.

So that leaves me with the question: is there really no design that gives us as ‘end-developers’ a better experience than having to put @inbounds in hundreds of places?

Even a recursive version that makes every function (only functions within the same module, I guess), that is called in an @inbounds_recursive (or some better name…) block @inbounds instead of having to mark absolutely every place would be an improvement.

The other option would be to double compile times when running with --check-bounds=no and compile the version with and the version without the bounds checks (so we could use the bound-checked version at compile time).

1 Like

Presumably you could write a macro to do this for you. For example,

using MacroTools: @capture, postwalk

function add_inbounds_everywhere(ex)
	postwalk(ex) do x
		@capture(x, a_[b__]) || return x
		return :(@inbounds $x)
	end
end

macro add_inbounds_everywhere(ex)
    return add_inbounds_everywhere(ex)
end

Of course, it’s not quite as easy as you only want to add @inbounds on getindex, not on setindex!, e.g. you don’t want to end up with @inbounds(x[1]) = 0. So you need corrections like:

function move_assignment_inbounds(ex)
	return postwalk(ex) do x
		@capture(x, @inbounds(a_[b__]) = c__) || return x
		return :(@inbounds($a[$(b...)] = $(c...)))
	end
end

function move_broadcasted_assignment_inbounds(ex)
	return postwalk(ex) do x
		@capture(x, @inbounds(a_[b__]) .= c__) || return x
		return :(@inbounds($a[$(b...)] .= $(c...)))
	end
end

function move_view_in_front_of_inbounds(ex)
	return postwalk(ex) do x
		@capture(x, @view(@inbounds(a_))) || return x
		return :(@inbounds(@view($a)))
	end
end

carefully_add_inbounds =  move_view_in_front_of_inbounds ∘ move_broadcasted_assignment_inbounds ∘ move_assignment_inbounds ∘ add_inbounds_everywhere

Presumably there are plenty of cases still missing, but if you

include(carefully_add_inbounds, "script.jl")

for a file script.jl containing

function f(x)
	x[4] = -100
	sum(x[i] for i = 1:4)  
	# Note that @inbounds sum(x[i] for i = 1:4) would not work;
	# you need sum(@inbounds x[i] for i = 1:4).
end

println(f(rand(3)))

function g(x, y)
	x[1:5] .= @view y[1:5]
end

g(zeros(4), rand(10))

you get something like

-99.202880576623
5-element view(::Vector{Float64}, 1:5) with eltype Float64:
 0.1550523932721809
 0.16183646229915816
 0.6185234401813514
 0.6716336573001286
 0.12471934509214788

as output, without BoundsErrors.

I’m away from computer now and can’t check myself, but wouldn’t wrapping your file in @inbounds begin ... end work?

Typically no. @inbounds on the outside of a function definition will just try to turn off bounds checking while the compiler defines the method. And since it only propagates one call, it won’t progress far down the stack to do anything meaningful.

3 Likes

This is another one of those cases where the “correct” solution is to write idiomatic, compiler-friendly code and rely on compiler optimizations, but there’s no simple way to guarantee that the desired optimizations took place. As in many such cases, just having a simple macro to turn every bounds check within a scope/block into an error would probably go a long way.

So, I trust we’ll see @MilesCranmer release InboundsDoctor.jl in 3… 2… 1…

7 Likes

Maybe someone can develop an InboundsArray type whose setindex and getindexmethods are defined with @inbound operations of the actual array wrapped, as a drop-in replacement of the Base Array. Then using InboundsArray is guaranteed to only affect user code not Julia internals. This could be more ergonomic than using @inbounds macro and chasing its propagation. (Except, of course, you’ll need to understand the danger.)

3 Likes

I find this issue problematic. I do understand the reasons why globally switching off bounds checking may have some unintended consequences.

However, an ordinary user, typically a domain specialist in some science, will have little patience with these points. It’s more like “I write a lot of x[i]. A lot. When developing, I want the compiler to insert bounds checks for my code. In production runs I don’t want it do so.” How hard can it be? It exists in other languages like fortran.

As we’ve seen, it can be hard, but for reasons that are incomprehensible unless you understand the bowels of the julia compiler.

It could, however be possible to write a couple of macros, e.g. a @nocheck which is a replacement for @inbounds. Instead of removing it textually when developing, the macro rather check some global (i.e. Main, or module level) state which can be switched on or off by some other macro (e.g. @docheck and @dontcheck) prior to everything else?

1 Like

I also write unstructured-mesh computations with indirect indexing for which there is little hope that the compiler can prove that accesses are inbounds.
I usually write a macro @fast that either does nothing or prepends @inbounds. So @inbounds can be activated for a specific module, either manually or based on an environment variable.
This kind of works but it would be good to have a robust way of doing something similar.

A complementary approach could work at the type level, i.e. a wrapper array type that would check bounds or not based on a type parameter.

1 Like

I liked @greatpet’s suggestion (thank you!), so I started working on a package

It seems to be promising. My simulation code needs MPI, so I hacked in support for that too - I’ll add it to the repo once I package it up in an Extension - but with that on a small test this package seems to allow me to recover the performance I got from --check-bounds=no, while using --check-bounds=auto:

  • --check-bounds=no, original code 3.42 minutes
  • --check-bounds=auto, original code 4.32 minutes
  • --check-bounds=auto, updated code using InboundsArrays 3.38 minutes

This was running on julia-1.10.7 where --check-bounds=no is still working for me. I haven’t tested extensively (just a single run in each configuration), but this level of performance would be good enough for me. It lets me recover the ~25% slow-down that not having --check-bounds=no would cost me.

4 Likes