My data often contain NaNs, -0.0, and 0.0. IMO. This has been a major headache. Right now, I have to replace my NaNs with a special number and replace my -0.0 with 0.00001, etc. in order to avoid these issues. It would have caused so much less headache, if the below were to be true:
julia> NaN == NaN
true
julia> isequal(NaN, NaN)
true
julia> -0.0 == 0.0
true
julia> isequal(-0.0, 0.0)
true
Does anyone know what advantages the current design offer?
Basically, this is the way == works in the float standard.
=== is intentionally a much stricter operation: it’s only true if the computer can’t tell the difference, and +0.0 can be separated from -0.0 using the underlying data.
But the true answer is: a choice needs to be made, and this is the choice that was made.
The thing is that we often do not know what some of the existing functions or packages decide to use == or isequal until errors are spotted sometimes in a hard way. Even Julia’s own unique function suffers from this issue as was discussed previously.
Whether there are usage errors is of no consequence. The important question is would changing it reduce the number of errors?
I’d argue no. The current system is pretty simple, and any edge cases are the same as would be in any other system. For a majority of cases remembering the rules “==: equals, ===: is identical to” will suffice.
that was a bug of unique, the behavior you’re seeing here, as explained, is specified by IEEE standard, any language that uses IEEE floating-point number has the same behavior (or they should, if they want to be comlient with IEEE…)
If you really need that behavior to not comply with the standard in your package, define your internal comparing function, such as:
julia> function compare(x,y)
if isnan(x) && isnan(x)
return true
else
return isequal(x,y)
end
end
compare (generic function with 1 method)
julia> compare(NaN,NaN)
true
julia> compare(0.0,0.0)
true
(bad example here, since you could just use isequal everywhere, and that would do it, but if you need any other specific behavior, that’s the idea, if you are expecting to deal with some sort of data that needs a special attention on that side, define your own function to deal with that)
It sounds like you are looking for sentinel values, as opposed to actual floating-point data. It’s probably better to deal with sentinels directly rather than to replace them with other sentinels such as 0.00001. As for NaN, the direct approach is isnan. You do indeed have a problem not knowing whether an existing package’s someequalitycheck(x,y) gets tripped up on these things, but there are two issues here. First, using any equality check with float data is slightly suspect, and could potentially be re-thought (e.g., isapprox). Second, assuming equality is a valid thing done by the function, don’t assume it can be fed NaN unless designed expressly to do what you want.
Yes, the IEEE standard has advantages, and this report by Bill Kahan provides some context. One can debate the merits of IEEE, but I really think your first issue is dealing with sentinels appropriately.