Possible bug in unique/Set

You are correct in pointing out the inconsistency between IEEE handling of exceptions / out of domain errors and the rest of the language (where it is customary to throw a DomainError or ArgumentError). However, I am not sure that making this more consistent by moving towards the IEEE mechanisms is a good idea, they are less flexible and you will get a hybrid anyway, just the boundary will be somewhere else.

Practically,

  1. for some people in some situations the options offered by IEEE are just great, because they mesh well with their algorithm or can be modified at little cost. Few people read the whole IEEE 754 though, or use more than a subset of the options.

  2. for others, the whole thing is a nuisance, and NaNs are particularly painful since they can creep through your algorithm silently, in which case you have to debug the whole thing to find out where this started. This can be tedious for algorithms with nontrivial complexity.

I had read "Possible bug in unique/Set ", I intentionally referenced the phrase in that post about how the decision to use IEEE standard was not irrevocable, but opening it was a “can of worms”.

I wanted to cast light on casual users. Having different behaviors for integers and floats regarding +/- 0 is a huge trash can of worms. The same applies to differences between == and isequal() for finite numbers. These inconsistencies will certainly lead to many gotcha moments for new users who just expect things to work as expected.

I disagreed with the first statement that this issue is not about rounding. The issue of sorting seems to be the key issue, since the requirement that -0.0 sort before 0.0 is causing problems (I think we agree that -0.0 == 0.0). It seems that the goal of the standard was to accomplish higher precision around zero. This might be important if users are trying to “faithfully” sort numbers that are effectively underflowing, but I would argue this is not a huge issue with Julia since any desired level of precision is already easily available. Despite that, we seem to be deciding to support a standard that attempts to solve a problem Julia doesn’t have.This is why I think this is a small can of worms and definitely worth revisiting before this important milestone you are on.

My suggestion is to make signed floats behave as Ints currently do, and to abandon this part of the standard. If there is a cosmetic problem with -0 sorting randomly with 0s, then replace -0 with 0 since they are equivalent at this level of precision.

Practically nobody wants to catch and handle domain errors. They are nice during testing / development, you’re right. But once I am reasonably sure that they will never appear except for bugs, a NaN spoiling my computation or an unhandled domain error stopping my computation differ only in the quality of logs for debugging (and this only matters if I can’t reproduce and need to look at logs / stack traces / core dumps). But the speed differences are very relevant (no SIMD).

I mean, are you using any code that relies on domain errors being thrown or min(x,y) not using the obvious CPU instruction? Most people who expect negative numbers for their square roots check this before, instead of in a try block.

Hence my wish for “implementation defined” default semantics of floating point: They should do whatever your CPU does, and if your program needs to handle edge cases correctly, then you have written explicit branches for all of them anyway; no real mental overhead of using Base.safe_sqrt or Base.safe_min if you are are already thinking about NaN or negative numbers.

(julia is focused on fast numerics; for security relevant code / systems programming, this could be very bad, but are these problems really Float-heavy?)

edit: I am not suggesting to move towards IEEE. I am suggesting to move towards “whatever your silicon does” (within bounds of reason), which happens to be IEEE most of the time.

For what it’s worth, even though Julia focuses on fast numerics, by default a = randn(4); a[5] will throw a bounds error, and indexing also turns off simd.
We have both @inbounds and the command line option on start up to turn that off.

@fastmath is the same way. Safety by default, but the option to turn it off.

julia> using BenchmarkTools

julia> fmin(x,y)= ifelse(x<y, x, y);

julia> fmin2(x,y) = @fastmath min(x,y);

julia> fsqrt(x) = @fastmath sqrt(x);

julia> @code_native fmin2(2.3, 4.5)
	.text
; Function fmin2 {
; Location: REPL[3]:1
; Function min_fast; {
; Location: REPL[3]:1
	vminsd	%xmm1, %xmm0, %xmm0
;}
	retq
	nopw	%cs:(%rax,%rax)
;}

julia> x = randn(128); y = rand(128); z = min.(x,y);

julia> @btime $z .= min.($x,$y);
  260.047 ns (0 allocations: 0 bytes)

julia> @btime $z .= fmin.($x,$y);
  188.759 ns (0 allocations: 0 bytes)

julia> @btime $z .= fmin2.($x,$y);
  189.024 ns (0 allocations: 0 bytes)

julia> fsqrt(-2.3)
NaN

julia> @btime $x .= sqrt.($y);
  464.888 ns (0 allocations: 0 bytes)

julia> @btime $x .= fsqrt.($y);
  251.252 ns (0 allocations: 0 bytes)

julia> @code_native fsqrt(2.3)
	.text
; Function fsqrt {
; Location: REPL[4]:1
; Function sqrt_fast; {
; Location: REPL[4]:1
	vsqrtsd	%xmm0, %xmm0, %xmm0
;}
	retq
	nopw	%cs:(%rax,%rax)
;}

But I agree with you in principal.
I want to at least always have the option to easily opt out of safety in favor of speed.
Often it’s easy for us to know that y is guaranteed to be positive, or that pointers a and b do not actually alias.

EDIT:
Also worth pointing out that if someone is suddenly having issues tracking down a NaN from a fast square root, they can just start Julia with --math-mode=ieee (similar to
--check-bounds=yes for turning off @inbounds) to help find the problem without having to go through their code.

1 Like

Given the very large amount of hardware that implements the IEEE standard and produces -0.0 as the result of floating point arithmetic operations, it’s unclear how to accomplish this without making all those floating point operations quite slow.

1 Like

ok. I will give up, because I clearly don’t understand a lot of this. It seems like it is already perfectly well accomplished just by not using isequal for hashing or sorting. < and == already behave as expected.The only thing left to do is cosmetic, Thanks for considering.

The == function is not an equivalence relation so it can’t be used for hashing and the < function is not a total order so it can’t be used for sorting. That is the reason for the existence of isequal and isless in the first place: they are the versions of these operations that can be used for hashing and sorting.

4 Likes

If I am sure that they never appear then the question is irrelevant. But if I am not (it is hard to be 100% certain with reasonably complex software, even if tested thoroughly), then I would prefer to be alerted by the exception handling framework, because then I have a reasonable chance of catching the issue as soon as it occurs, getting an informative error message and stack trace (in the ideal case). Just to clarify, I am talking about bugs, ie edge cases I did not handle correctly. In practice, these happen all the time, even with careful attention to detail; some problems are just complex.

This would imply that my program may have a different flow on different CPUs (again, unless I covered all the edge cases perfectly). While I accept that floating point operations per se are not 100% portable between machines, this would add extra uncertainty that I am not sure is worth it.

That said, I find your suggestion is may reasonable in some contexts, but it is possible that we disagree on this because I am willing to trade some speed for some features I find convenient.

1 Like

welcome to the wonderful world of floating point :wink:

3 Likes

I do believe I’m becoming obsessed with this question.

I’m not sure I understand why < is not total order. If nz = -0.0 and pz = 0.0, then exactly one relation holds that nz == pz. Does that not suffice? I further understand that IEEE standard wants nz < pz, and I further understand that I am binning nz and pz together. I think this is a minor point, because floating point representations always bins numbers together. Precision in floating point representation is widely acknowledged issue and so is easy to accept.

I also read that == was not an equivalence operator (in Julia) because of the way it treated NaNs (or maybe it was because of the IEEE standard way of dealing with NaN). Mathematica treats that symptom by making the statement NaN == NaN return true. I don’t know if that would break a lot of things if it were implemented in Julia, but it seem very practical in my work and would avoid plenty of frustration and work-arounds that I encounter with on a day to day basis. I have rarely (never in my memory) relied on NaNs being non-comparable).

Thinking about this got me into a half day effort to see how other languages deal with signed zeros and NaNs.
I compared R, Matlab, Mathematica, Julia, and MSVC C++. I haven’t seen anyone do that here (I hope it isn’t discouraged – I didn’t see any warnings in the FAQ).

All theses languages agree to return the same results for operators == and < when using signed zeros. For NaN == NaN they differ a bit. R says they are not comparable and Mathematica has them as equal (as I said above), otherwise they are not equal. NaNs are an odd edge case of non-comparable, and each language deals slightly differently, and most have options for changing their behavior, but the defaults are very reasonable.

One set of criteria/questions/issues that were brought up in the original discussion (thread #18485; @eschnett) was how issorted() and unique() (actually hash) should be handled. Of the languages I looked at only Julia has resolved to define new behavior that is inconsistent with the user facing functions == and < (at least with respect to signed zeros). All the languages I tested except Julia sort -0.0 intermixed 0.0 (the sign bit is there but masked for display in some languages), All except Julia hash -0.0 and 0.0 as a single key. NaN is not a valid key in MatLab, but I’m not sure about the others.

Welcome to the club, this is probably one of the things that we’ve collectively spilled the most time and thought on :crazy_face:

I’m not sure I understand why < is not total order.

It’s the usual culprit—NaN—with the following behavior mandated by IEEE 754 and implemented in most other programming languages:

julia> NaN <= NaN
false

julia> NaN < 0
false

julia> NaN > 0
false

Good times.

I further understand that IEEE standard wants nz < pz, and I further understand that I am binning nz and pz together.

IEEE defines two different orderings: a “normal ordering” and a “total ordering”. Julia’s == and < implement the normal ordering while isequal and isless roughly implement the total ordering (we deviate on NaNs, which isequal and isless treat as equal to each other and all larger than Inf, regardless of sign bit).

I also read that == was not an equivalence operator (in Julia) because of the way it treated NaNs (or maybe it was because of the IEEE standard way of dealing with NaN).

Yes, this is part of the IEEE spec for the “normal” equality relation.

Mathematica treats that symptom by making the statement NaN == NaN return true.

That’s very tempting and I’ve proposed it myself sometimes. The big issue is that there are a lot of pre-existing numerical codes in C, Fortran and other languages out there which behave according to the IEEE standard and if we do this, porting that code will be a massive trap since it will silently do something very different than was intended.

If we were starting afresh and didn’t care about IEEE or the existing code in the world, it would certainly make sense to make == and < a total ordering. Not having NaN or -0.0 would also be better than having them, imo. There must be better ways to express the things they’re useful for than throwing the most basic properties of the real numbers as an ordering out the window.

I compared R, Matlab, Mathematica, Julia, and MSVC C++.

Julia, Matlab and C++ should all behave in the same fashion—as recommended by IEEE 754. R has NA value in every type and mostly treats NaN as if it were NA, which means it behaves according to three-value-logic. Mathematica takes the “to hell with IEEE 754” approach and makes < and == a proper ordering on floating-point (it probably takes a performance hit for this, as does R for its behavior).

Of the languages I looked at only Julia has resolved to define new behavior that is inconsistent with the user facing functions == and <

Most of them do not provide a generic user-extensible mechanisms for equality and sorting with built-in defaults for floats. If they did, they would face the same difficult choices as we do.

All except Julia hash -0.0 and 0.0 as a single key.

Most of these languages don’t have a default dict implementation, so it’s unclear to me how you’d determine that.

All the languages I tested except Julia sort -0.0 intermixed 0.0 (the sign bit is there but masked for display in some languages),

We could certainly do that and make isequal(-0.0, 0.0) true and make -0.0 and 0.0 hash the same. That would eliminate one discrepancy between isequal and == but unless we’re willing to throw IEEE out the door, we still can’t get rid of isequal and isless entirely. Is your goal here to get rid of the extra pair of functions or just to make negative and positive zero hash the same? Making -0.0 and 0.0 sort and hash the same would be relatively doable. Getting rid of isequal and isless would be much more disruptive. The single biggest issue would probably be that we’d suddenly have x ≤ NaN for every floating point x which would potentially introduce a lot of breakage.

I must say that’s really tempting. IIRC this wasn’t done because that would imply mixing -0.0 and 0.0 when sorting, but that sounds like a really minor thing compared to the confusion created by e.g. unique treating them as distinct.

(I agree other behaviors are good choices.)

1 Like

Someone doing that (for the whole NaN mess) may be the next big leap in numerical computing :wink:

There is no NaN mess. In comparison to the situation before IEEE745 and taking this as implementation standard in FPUs/CPUs (early 90s) we are very well prepared for numerical calculations. It’s getting messy if you treat NaN as a number, which it isn’t.

As many have elaborated above, there is more to this issue. "blueberry" is not a number either, but

julia> "blueberry" == 42
false

does not cause any problems. NaN is a sentinel value with semantics that are more or less orthogonal to well-defined operations on \mathbb{R}, yet they share operators so the two semantics interact; 1.0 == 2.0 share the same operator as NaN == NaN, but implement different concepts.

I wonder if we could just pretend that NaN == missing for all purposes.

Could you please elaborate on the differences between

“It’s getting messy if you treat NaN as a number, which it isn’t.” and

The point about NaN is, it’s a marker, it’s sticky and it should show you, where to look again into your algorithm or your input data. If you treat this a ‘missing value’ - which is quite popular in the matlab domain i learned - you just misuse a concept.

Sorry, I was not clear. You are of course right that many problems can be prevented by simply not treating NaN as a number. But at the same time, the point of NaN is that you can frequently use it in place of a number, ie do some kind of error handling without branching explicitly. If you start testing for NaN all the time, you get very messy code.

I am not sure about this. NaNs can carry a payload, so they were explicitly designed to be used for custom purposes, which may include missing values. I am not saying this is a good idea though.

@JeffreySarnoff is very knowledgeable about the issues with NaN, and about NaN payloads

If we want code to be fast, then x86 offers a native comparison with the following semantics:

  1. The result is four-valued: less-than, greater-than, equal, incomparable.
  2. NaN is incomparable to everything, including itself. Everything else is comparable.
  3. -0.0 and 0.0 are equal.

Hence, it makes sense to make isequal(0.0,-0.0), and also hash them to the same value. Tiny disadvantage: Arithmetic is ill-defined modulo ==; that is expected. It now also becomes ill-defined modulo (typeof(x)==typeof(y) ) && isequal(x,y).

If one builds in the possibility of incomparability into Base.Order API, then one could get a consistent sorting result between missing and NaN values (e.g. put them all to the right, and depending on algorithm do so stably), without causing extra comparisons. That is: The compare primitive on which Order is built could e.g. return a Tuple{Bool,Bool} with (true,false): x<y, (true,true): x==y, (false,true): x>y,(false,false): incomparable (because missing or NaN).

If I understand IEEE-754-2008 right, then this would be allowed? And the current comparison used for sorting is incompatible with the “total order” (section 5.10) spec anyway, because NaN-comparison in julia does not respect the sign bit, but is also incompatible with the comparisons from section 5.11, because we distinguish -0.0 and +0.0, which is a problem for stable sorts?

3 Likes

The only change that’s really plausible would be hashing and sorting -0.0 and 0.0 the same. Any other changes are interesting to discuss but academic at this point. Unfortunately, that includes getting rid of isequal and isless altogether, but we simply cannot change the behavior of NaN at this point—far too much would break.

1 Like