Hey Julianners,
I just want to ask, as I ran into this problem 4th times even after that I know what causes this problem.
Shouldn’t we make it happen to be correct as it is expected like in other languages?
So basically the @fastmath version would be always correct if I am correct. Anyone else who would want to run it otherwise should have an option like @nofastmath or something?
But yeah… I of course I see the problem that… the whole julia should be started with --math-mode=fast so it would work like expected consistently. I don’t know what do you guys think? Shouldn’t julia maybe run with fast mode automatically all in all, so beginners doesn’t run into this trap?
Well it is wrong in both case without @fastmath. But fair point. Still very bad with fastmath too!
Omg… I didn’t know about this.
Ok… I have nothing to add. This is very bad situation. I will just accept the truth that we cannot handle it anyhow else like I have to pay attention and handle NaN and Inf in each case.
That is because some code was written in a manner assuming IEEE math.
This code was broken under --math-mode=fast. sinpi and cospi are examples.
I do agree though that most code doesn’t assume it, and most of the code that does is written by experts in a deliberate manner, so IMO it isn’t totally unreasonable to at least generally have contract enabled.
It also seems silly that a::Float64 + 0.0 cannot be optimized away without @fastmath, but a + (-0.0) can.
But you should prefer 0.0 over -0.0 in cases where it cannot be optimized away, because the former is cheaper for the CPU to generate, meaning which to choose depends on knowing whether or not it’s likely to be optimized away in this case… @fastmath is simpler in these cases, but also likely to pessimize generic code like ForwardDiff.Duals which use fallback definitions for the fast functions, resulting in lots of code not getting inlined that otherwise should.
NaN propagating through operations is intended behavior, following IEEE 754
You’ll have to thank Intel for this, who, in their infinite wisdom, decided that this is how it should behave in x86… See also this:
Compares the low single-precision floating-point values in the first source operand and the second source operand and returns the minimum value to the low doubleword of the destination operand.
If the values being compared are both 0.0s (of either sign), the value in the second source operand is returned. If a value in the second operand is an SNaN, that SNaN is returned unchanged to the destination (that is, a QNaN version of the SNaN is not returned).
If only one value is a NaN (SNaN or QNaN) for this instruction, the second source operand, either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN in either source operand be returned, the action of MINSD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.
The situation is of course the same for Float64 i.e. double precision. Fixing this in x86 is kind of impossible, practically speaking.
I think python. But pretty supprised as Sakura mentioned it is IEEE754 standard so… I can even imagine I didn’t even know about it when I was using Python? Would be crazy…
You’re right, Python doesn’t follow IEEE 754, and I believe that’s the way to use NaN without NumPy (so syntax (and semantics) with or without numpy is non-ideal):
>>> min(5.0, float("nan"))
5.0
You implied to me that you (fully, you didn’t) used NumPy, but to be fair its function works:
>>> np.minimum(5.0, np.nan)
nan
>>> np.nanmin(np.array([5.0, np.nan])) # Here "ignoring any NaNs. When all-NaN slices are encountered a RuntimeWarning s raised and NaN is returned for that slice." and order doesn't matter.
5.0
Yes, people claim e.g. Python’s min and max are buggy (I would agree IEEE should be followed, and some non-default way to ignore its rules, as in Julia, e.g. with NaNmath.jl), though not all did agree…:
That’s nice for them. In Python, we have min and max functions that are under no obligation to work like minNum and maxNum. In practical terms, either you care about NaNs or you don’t. If you don’t, filtering them out is trivial. […]
IEEE rules require that NaN not propagate through min and max unless both operands are NaN. See William Kahan’s remarks here. But also in that thread, the consensus was that the rule was surprising and perhaps should not be followed by Julia.
What justification do you have for why max(NaN, x) should be x, aside from appeal to the spec? The IEEE 754 recommendation of min and max functions ignoring NaNs is extremely questionable, as Kahan writes in the above response to Arch’s query. Under the “not a real” interpretation of NaN there’s never any justification of not propagating a NaN since it’s a permanent poison value. This is the most conservative interpretation of NaN and the one we use pretty much everywhere. Even under the more relaxed “unknown real” interpretation of NaN, max(NaN, x) == x can only be justified when x == Inf and otherwise the result must be NaN, which is the rule that Kahan suggests.
Mea culpa - I must confess that I do not possess a copy of IEEE 754 (the horror!) and can’t look it up myself
Interestingly, some part of that minNum/maxNum/fmin/fmax kerfuffle depend not only on the programming language, but also on which version of the standard the language implements (and possibly the hardware, depending on if it’s compiled properly):
Do we distinguish between quiet and signaling NaNs at all? I don’t think so
Scrolling down through that thread, it seems Agner Fog also left us a comment with good reasoning for the existing min behavior - vector instructions can generate more than one NaN at a time, making it impossible to distinguish where the bad one occured. Propagating thus makes more sense.
One real case for me is to produce images out of grids, which is a common task on Geophysics. To map float numbers of a grid into the integers (uint8) via a color table one must know the array’s min/max. And it is also common that those grids (arrays of floats) have NaNs in them to indicate no-value numbers (for example any grid of an oceanographic variable that covers also a part of land). For those cases I was forced to write custom min/max functions that ignore NaNs.
You may also remember that old Matlab versions used to have the Julia behavior but they changed to min,max ignore NaNs long time ago.
Isn’t it lucky that you can write custom min and max functions that will be fast enough? Ignoring NaNs seems convenient for working with images, but that’s a weak justification for a dangerous behavior in a function that everyone uses.
I think that weakness depends on ones view point. I don’t remember to have seen any complain (though ofc there should have been some) from Matlab users.