We have been considering this issue for several years regarding overflow handling of FixedPointNumbers.
Fixed-point numbers are somewhat closer to integers than floating-point numbers. So, there is a plan to (re-)use the CheckedArithmetic.jl mechanism. (And now we are moving toward using OverflowContexts.jl.)
As it pertains to this topic, / is also a candidate to be “checked”. (For convenience, I have named the function for /checked_fdiv.)
If the overflow handling framework is completed with integers and fixed-point numbers, we will be able to check for division-by-zero of floating-point numbers as well (at least for user codes).
Because the compilers are ignorant of hardware interrupts, they vectorize as though they don’t exist.
Unlike software checks and interrupts, which compilers do not vectorize.
This does mean that they brake semantics. That is, when you get an error, which results ended up actually getting stored is based on how much the compiler vectorizes. So, with a vector width of 8
julia> y .= NaN;
julia> x2[16] = 0.0;
julia> setfpexceptions(fpinvs!, FE_DIVBYZERO, y, x2)
ERROR: DivideError: integer division error
Stacktrace:
[1] /
@ ./float.jl:412 [inlined]
[2] fpinvs!(y::Vector{Float64}, x::Vector{Float64})
@ Main ./REPL[7]:3
[3] setfpexceptions(::Function, ::UInt8, ::Vector{Float64}, ::Vector{Float64})
@ Main ./REPL[6]:4
[4] top-level scope
@ REPL[19]:1
julia> y[1:16]'
1×16 adjoint(::Vector{Float64}) with eltype Float64:
1.2321 5.48182 2.78002 1.43467 6.14824 8.84106 2.46354 2.39465 NaN NaN NaN NaN NaN NaN NaN NaN
even though it was element 16 that was invalid, because it is doing batches of 8, all of 9-16 are NaN!
This might be surprising behavior (and is why compilers don’t vectorize when you have software exceptions).
In more complicated examples, someone could easily go down a rabbit hole under the assumption that x2[9] was the first bad element
and tear their hair out trying to figure out why it looks good on closer inspection, not realizing they need to validate the entire chunk that happened to be in the same SIMD vector.
So I get why this can be bad.
But I could live with it.
What’s with the exceptions and the global flags anyway? From my ignorant naive perspective, wouldn’t it be easier to specify values or sets of values for any pathological condition we need? There’s millions of NaN values to spare.
the problem is that even if you specify what they do, you still need to specify what happens when you combine nans, and there is no good way to do that (especially if you want to preserve commutativity)
These ideas were really useful especially when you are trying to figure out divide by zeros and zero
Divided by zero in differentiated code where it is often not one’s first thought to put divide by zero protection on the derivative a function as well. Maybe as this becomes standard practice, you learn how to do this naturally as a programmer.
I’m used to MATLAB and searching for nans and INF in the output. Julia’s adherence to those “standards” are fine with me because running fast and also not blowing chunks on an embedded system is what I prefer. We need to write the code to not execute paths that can divide by zero on those conditions even in c and c++, Or at least rescue it afterwards. The trick is getting it to not execute on a derivatives divide by zero which is not always obvious. This way the automatic differentiation tools work consistently. Is this an unusual opinion? I always learn so
Much from these forums.
The globalness is the other issue here. What if some subroutine produces a NaN that’s fine, which is then handled? Now it throws before you can even get to the exception that you actually want. But maybe that’s fine, it’s just totally non-composable.
The original idea for having so many NaNs was that you would encode an instruction pointer or other tag in the NaN payload, so that later when you get a NaN result you can say “Ick! Where did this come from?” and get an answer from the payload. Unfortunately the hardware hasn’t really committed to this approach so it really isn’t workable. What I would want to make this actually work are the following features/guarantees:
If some of the arguments to an instruction are NaN and the returned value is NaN then the payload of the returned NaN is the payload of one of the arguments. Ideally the one that is the “nonstandard NaN” if any.
Operations that can produce NaNs should take a “default” argument that specifies what value to return in the NaN case.
Compilers should provide APIs to fill in the default value with something useful like a NaN with a payload that’s a pointer to some line info.
But none of this is available, so NaN payloads are just a useless waste of space.
Yeah.
AVX512 supports exception control at the instruction level.
I’m not sure what all the options are, but you can at least do vdivpd zmm1 {k1}{z}, zmm2, zmm3{sae} to suppress all exceptions.
So at least in theory, SAE could be applied by default, except for select instructions where you want to be able to enable an exception.
I seem to remember being at a talk where someone mentioned that they were using NaN boxing, and they kind of apologized for it as an ugly hack … but William Kahan was in the audience and spoke up to say that this was exactly the sort of thing they had intended NaN payloads for. See also a nice 2019 talk on NaN boxing for Javascript.
That’s great! I looked into overflow, and other flags (for integers) and they are set, and then potentially overwritten, i.e you have to check right away, I understood. Are floating-point exceptions sticky? The Alpha had non-precise exceptions, otherwise they are a bitch for hardware implementors. [It only takes one good arch, e.g. x86, or Arm64 with good semantics.]
I’m thinking if they are sticky for FP (or could be hypothetically… also for integers), could you just check them after loops? That would be ok in many cases, less for e.g. calls to other functions, e.g. exp. You might have no idea were overflow or divide by zero happened, but hopefully inlining to the rescue…
Yes, FP flags are sticky.
At least in C++, the overhead of checking them using cfenv is non-zero (or equivalently, of c with fenv.h).
There are non-inlined function calls to check the state, so it isn’t free.
Setting them to trap has no overhead when they do not actually trap.
Turning trapping on and off does, of course, have overhead, probably similar to checking if a flag is set. But it has been a while since I have benchmarked.
Also, as a correction to compilers assuming no tracking math, that is of course configurable, but generally the preferred setting, otherwise they probably won’t vectorize much.