I agree that the process you describe would result in the ‘best code’ in some sense. I think we have a bit of a cultural disconnect here though. Pragmatically, given limited time (and possibly limited Julia-coding expertise), I’m aiming for good-enough code that is as easy as possible to write and maintain (and teach students and other new users to contribute to). For me, and I think many HPC-using scientists, ‘good enough’ does include ‘the best performance we can get’, even if that involves compromises on safety.
My point is that that pragmatic, good-enough experience for an HPC scientist has been pretty good in Julia, but the ongoing failure of --check-bounds=no is making it worse, and requiring boiler-plate code around every performance-critical function is making the experience worse in a different way (in terms of the developer productivity and learning curve rather than code performance), but it is for me (us?) a very significant way.
No I totally understand. It’s a boiler plate-heavy solution and an especially tough sell in an environment where you’re continually onboarding students who need to get up to speed quickly. I don’t see a perfect solution, but hopefully the compiler will keep getting smarter and make @inbounds less relevant, and perhaps it’s possible to make linting clever enough to emit warnings like “the compiler will struggle to optimize this loop; click this link for 3 simple steps to writing fast loops” (with an emphasis on simple, and where the first two steps are things that often help make @inbounds irrelevant, while the third is to use @inbounds)
Just to add some more motivation - while I was testing various options I found another type of run with our simulation code where removing bounds checks results in a 2x speed-up. We really can’t do without that!
extremely noob question: I see a lot of commentary along the lines of “removing bounds checking before compilation makes the compilation more challenging and do weird things”
but is it be fair to say that bounds-checking operations are mostly self contained and recognizable? what would be the difficulties involved in a process like
compiling the code with all bounds checks on as per default, ignore @inbounds
afterwards, go through the generated code (maybe LLVM level?) and delete any lines corresponding to bounds checks
accept the fact that OOB access may happen and crash
I think this is probably what we should do. It wouldn’t be too hard, but will roughly double compilation time (since you need to compile all the code with checkbounds=true first).
I don’t think either of these are acceptable, I’d rather keep a convenient compiler flag used after I put in safeguards that a call-wise compiler might miss. It’s also worth pointing out that --checkbounds=no is not equivalent to removing boundschecking from compiled code; see Consider removing --check-bounds=no? · Issue #48245 · JuliaLang/julia, where OP has continued discussion. The performance gains or losses of --check-bounds=no or @inbounds have also been observed to be platform-dependent. Granted, it’s not difficult to make a Julia program that indexes arrays so much that --check-bounds-no speeds it up across all platforms.
Indeed, you would need eachindex or axes to give the compiler the opportunity, which basically serves to move boundschecking out of hot loops. The compiler can also stop at function barriers, so it can’t elide boundschecking for something as simple as @noinline index1(a::Array) = a[1].
That can’t do the same thing as a compiler flag like --checkbounds=no, as imported modules and packages without @inbounds would still be compiled and precompiled with boundschecking. If you want to look into that anyway, several macros including @inbounds can’t take module expressions as inputs because they assign the input code to a temporary variable. You also can’t do module Name @inbounds begin ... end end because that at best only elides boundschecking for code executed within the module expression, which it currently doesn’t seem to do at the global scope anyway. @inbounds doesn’t happen to make this easy, we’d need a macro that adds @inbounds inside methods.
but if it’s opt-in, why not? if some users want to use a flag --posthoc-strip-out-all-boundschecks-at-the-risk-of-segfaults-and-long-compile-times why shouldn’t that be an available feature?
I provided my reasons for preferring --checkbounds=no, so I’m not sure what you’d like me to clarify. I’ll hazard a guess that you’re trying to argue risky code should be an option, but the absence or removal of boundschecking typically isn’t intended to make memory-unsafe code, just eliminate unnecessary overhead after ensuring inbounds access in other ways. That applies just as much to --checkbounds=no. Silent errors, vulnerabilities, or crashes upon unvalidated inputs are considered bugs, and it’d be especially terrible for HPC.
Silent errors, vulnerabilities, or crashes upon unvalidated inputs are considered bugs
is it really a bug if it comes with a huge disclaimer in the docs?
I am not defending specifically the existence of --checkbounds=no ; I don’t know enough about the implementation details to have a strong opinion. But in more general terms a tool that takes compiled code and excises everything that looks like a boundscheck seems like a pretty reasonable desire.
You can argue that it isn’t, but again, it’s irrelevant to my preference for --checkbounds=no.
As mentioned, the --checkbounds flag affecting other compiler optimizations is the problem. I don’t know why --checkbounds=no must obstruct constant folding, and this could be LLVM behavior that Julia can’t do much about. Consider the flip side, assuming inbounds access also opens up compiler optimizations like SIMD. Retroactively removing boundschecking from compiled code doesn’t add those optimizations, so we’d have to truly recompile anyway.
That’s probably an acceptable compromise - it would be for me. I’m OK with assuming that external packages (e.g. LinearAlgebra, etc.) would optimise appropriately and use @inbounds themselves when it is likely to improve performance. Being able to @inbounds all the code in my own project is what I want. [If it was possible to use @inbounds on an entire module, and a package can, at the moment, only be used with best performance with --check-bounds=no, then that package should @inbounds itself, and warn users to test with --check-bounds=yes. I guess that would be an unusual case though.]
This is true, but dealing with the fact that there would be undefined behaviour for an out-of-bounds array access is a very well-established and accepted part of working in HPC. We are all limited by the compute-time budget we have on whatever HPC cluster(s) we are using, so will spend the time to check correctness of indexing and verify inputs sufficiently to avoid out-of-bounds accesses during development and testing, before deploying the large simulations that provide scientific output. It’s for those large simulations where paying the cost of bounds-checking is not worth it, and so it is worth doing the work up front to not need bounds-checking.