turning on --check-bounds=no on current master can actually result in significantly
worse performance because it removes inference’s ability to do concrete evaluation (constant folding)
In the whole, --check-bounds removal discussion, we had assumed that we did not want to keep two copies of all code just for the purposes of --check-bounds which thus required us disable constant propagation.
I still don’t understand why would constant propagation need to be disabled where @inbounds is used. My first guess was that one wouldn’t want to disable bounds checking during constant propagation so as to prevent an internal compiler error because of an out-of-bounds access? But, in that case, wouldn’t it be better to just disable @inbounds during constant propagation (i.e., run all bounds checks during constant propagation, as with --check-bounds=yes)?
That shouldn’t even be a concern IMO. If a user writes down @inbounds, he / she should accept any risk from seg faults to compiler errors, if out-of-bound access does happen.
But, in that case, wouldn’t it be better to just disable @inbounds during constant propagation (i.e., run all bounds checks during constant propagation, as with --check-bounds=yes ).
This changes the behaviour of the code and maybe wouldn’t be valid, it would also mean that every code that has @inbounds would have to be compiled twice
But it’s the user’s own fault if this happens? @inbounds already is an unsafe technique meant for getting better performance, so I think the change would be justified.
I think this would be justified, too. If someone doesn’t like the increased compilation costs, just don’t use @inbounds. Getting better run time but worse compilation time when @inbounds is explicitly used sounds like a very good deal.
Also, with the current situation the users already need to have two different definitions of a function in some cases: one with @inbounds that is faster at run time but can’t get constant folded, and one without that can get constant folded. So it’s better to pass that work on to the compiler (automated), than leaving it as manual work for the users.
the problem is that inbounds promises that the function is only run with inbounds access, while the compiler needs to know that all possible calls to the function will result in inbounds access which is a much stronger condition.
TBH I don’t understand how this is related. Why can’t the compiler simply pretend there’s no @inbounds (like with --check-bounds=yes) while doing constant propagation?
the problem isn’t constant propagation, it’s constant folding which involves calling your functions with values that may or may not be made up by the compiler.
That makes sense, but I can’t understand why it’s possible for this assumption to cause miscompilations. I suppose you mean situations like this:
if runtime_branch
a = 1
else
a = 2
end
x[a]
Where, at runtime, it’s possible to check at runtime that a = 2 ONLY in cases where x[2] is a valid index. In contrast, if the compiler needs to constant fold x[2], then it need to check at compile time that x[2] is a valid index.
What I don’t understand is: Why can’t the compiler simply constant fold x[2] with boundscheck enabled? Then the constant folding will raise a BoundsError - but this is not a problem, right? The fact that the compiler attempted to constant fold, but it failed is really no different from having a function correctly constant-fold to raise an error?
We’ve had this discussion before, but I’m pretty sure if the compiler were to do that, it’d end up being an illegal transform. It can’t just call random functions at compile time with values it’s not certain are an actual possibility at that point.
I’m pretty sure if the compiler were to do that, it’d end up being an illegal transform
it can and does do this though. it just requires the results produced from those calls to not be used if the values are made up. for a simple example, consider
if g_is_legal_to_cal_with_x(x)
return g(x)
end
The compiler is allowed to call g on inputs that violate the invariant the programmer set up because in those cases, the result of that call will not be observed.
Yes - the compiler is only allowed to do that if the function is sideeffect free. It most certainly isn’t allowed to do that on any arbitrary g.
That’s also the actual reason why the compiler can’t just turn bounds checking on/off at will - doing that changes the effects of the code, which can (and likely will) result in it no longer being legal to constant propagate/concretely evaluate in the first place.
That’s just the thing, though, isn’t it? The compiler can’t assume that must_not_get_1 doesn’t have catastrophic side-effects, so it can’t call that multiple times or out-of-order or with arbitrary inputs.
Indexing without bounds checks can also have catastrophic side-effects.
That’s not really relevant, though? My question was, why not just run as if with --check-bounds=yes during constant folding/propagation/whatever, so I’m not discussing indexing without bounds checks. Although I must say some of the previous messages in this thread have been going over my head, at least partially.
If the compiler infers one set of effects with bounds checking on, and another set of effects with bounds checking off, I would think it should perhaps merge the two sets of effects to get the best inference for each individual effect. If the result is wrong, it’s the users fault.
I’m not a compiler person, but the compiler has got to start by making no assumptions of the code itself. Whether it’s safe to make more assumptions or do something fancy — and how hard it’d be to do it — I don’t know.
Ignoring that this completely doubles the work the compiler has to do (effectively what @gbaraldi mentioned above), there is no clear “best” though. Some code just inherently has sideeffects. Some code can (under some circumstances) get some of its effectful code eliminated, leading to “better” effects. Some effects have cascading impacts on other effects.
The situation here is similar to the wrongly named “fastmath” - if there were such an easy “fast math”, wouldn’t we use it all the time?
Blaming the user for something a compiler ultimately did of its own volition is very bad UX though. A compiler generally should strive for being easy to work with, as well as having predictable behavior. This doesn’t achieve that.
But is this not what the user asked for when they used an @inbounds?
As far as I understand, the current UX is actually the same? The current situation already is that the user, if they use @inbounds, has to test their code with --check-bounds=yes to check whether the behavior of their code depends on @inbounds. If the behavior depends on @inbounds, the user clearly made a mistake. This would stay the same with my proposal, as far as I understand?
I was referring to your mixing & matching of effects.
@inbounds has horrible UX, yes. That’s exactly why people should not be encouraged to use it when they don’t know why their code is slow.
Sure, but that just means there’s very little reason to implement it It’s not good to increase the number of things with bad UX. You perceive some benefit by mixing & matching effects from different bounds checking settings, which just aren’t there.
Effects work by assuming all code in a function has all “good” effects. The compiler then goes through the code, seeing things it knows have sideeffects, resulting in the function having those same sideeffects too (it’s more complicated in practice, but this is good enough as a mental model). There’s no simple “just merge effects” you can do here, because you can’t just merge the reasoning that led to one set of effects over another. @inbounds is an issue for effects inference precisely because the compiler has no idea whether the annotation is safe & correct or not - if it’s wrong, the code has all the “bad” effects (it may segfault after all).
Julia doesn’t have such an assume, if I’m intuiting the meaning correctly here.