Welcome to the club, this is probably one of the things that we’ve collectively spilled the most time and thought on …
I’m not sure I understand why < is not total order.
It’s the usual culprit—NaN
—with the following behavior mandated by IEEE 754 and implemented in most other programming languages:
julia> NaN <= NaN
false
julia> NaN < 0
false
julia> NaN > 0
false
Good times.
I further understand that IEEE standard wants nz
< pz
, and I further understand that I am binning nz
and pz
together.
IEEE defines two different orderings: a “normal ordering” and a “total ordering”. Julia’s ==
and <
implement the normal ordering while isequal
and isless
roughly implement the total ordering (we deviate on NaNs, which isequal
and isless
treat as equal to each other and all larger than Inf
, regardless of sign bit).
I also read that ==
was not an equivalence operator (in Julia) because of the way it treated NaNs (or maybe it was because of the IEEE standard way of dealing with NaN).
Yes, this is part of the IEEE spec for the “normal” equality relation.
Mathematica treats that symptom by making the statement NaN == NaN
return true
.
That’s very tempting and I’ve proposed it myself sometimes. The big issue is that there are a lot of pre-existing numerical codes in C, Fortran and other languages out there which behave according to the IEEE standard and if we do this, porting that code will be a massive trap since it will silently do something very different than was intended.
If we were starting afresh and didn’t care about IEEE or the existing code in the world, it would certainly make sense to make ==
and <
a total ordering. Not having NaN or -0.0 would also be better than having them, imo. There must be better ways to express the things they’re useful for than throwing the most basic properties of the real numbers as an ordering out the window.
I compared R, Matlab, Mathematica, Julia, and MSVC C++.
Julia, Matlab and C++ should all behave in the same fashion—as recommended by IEEE 754. R has NA value in every type and mostly treats NaN as if it were NA, which means it behaves according to three-value-logic. Mathematica takes the “to hell with IEEE 754” approach and makes <
and ==
a proper ordering on floating-point (it probably takes a performance hit for this, as does R for its behavior).
Of the languages I looked at only Julia has resolved to define new behavior that is inconsistent with the user facing functions ==
and <
Most of them do not provide a generic user-extensible mechanisms for equality and sorting with built-in defaults for floats. If they did, they would face the same difficult choices as we do.
All except Julia hash -0.0 and 0.0 as a single key.
Most of these languages don’t have a default dict implementation, so it’s unclear to me how you’d determine that.
All the languages I tested except Julia sort -0.0 intermixed 0.0 (the sign bit is there but masked for display in some languages),
We could certainly do that and make isequal(-0.0, 0.0)
true and make -0.0 and 0.0 hash the same. That would eliminate one discrepancy between isequal
and ==
but unless we’re willing to throw IEEE out the door, we still can’t get rid of isequal
and isless
entirely. Is your goal here to get rid of the extra pair of functions or just to make negative and positive zero hash the same? Making -0.0 and 0.0 sort and hash the same would be relatively doable. Getting rid of isequal
and isless
would be much more disruptive. The single biggest issue would probably be that we’d suddenly have x ≤ NaN
for every floating point x
which would potentially introduce a lot of breakage.