Why is Julia designed this way? NaN != NaN but -0.0 == 0.0

I’m reading Julia’s documentation, and below are the realities:

julia> NaN == NaN
false

julia> isequal(NaN, NaN)
true
julia> -0.0 == 0.0
true

julia> isequal(-0.0, 0.0)
false

My data often contain NaNs, -0.0, and 0.0. IMO. This has been a major headache. Right now, I have to replace my NaNs with a special number and replace my -0.0 with 0.00001, etc. in order to avoid these issues. It would have caused so much less headache, if the below were to be true:


julia> NaN == NaN
true

julia> isequal(NaN, NaN)
true

julia> -0.0 == 0.0
true

julia> isequal(-0.0, 0.0)
true

Does anyone know what advantages the current design offer?

1 Like

I believe it has to do with comparing value vs comparing representation in memory.

NaN does not have the same value as NaN, so NaN == NaN is false, but NaN has the same representation in memory as NaN so isequal(NaN, NaN) is true.

-0.0 and 0.0 have the same value, but not the same representation in memory, so there it is reversed.

2 Likes

This is according to the IEEE specification. Not really specific to Julia.

18 Likes

Part of the answer is here:

https://docs.julialang.org/en/v1/base/math/#Base.:==

Basically, this is the way == works in the float standard.

=== is intentionally a much stricter operation: it’s only true if the computer can’t tell the difference, and +0.0 can be separated from -0.0 using the underlying data.

But the true answer is: a choice needs to be made, and this is the choice that was made.

4 Likes

Thanks all for the replies!

The thing is that we often do not know what some of the existing functions or packages decide to use == or isequal until errors are spotted sometimes in a hard way. Even Julia’s own unique function suffers from this issue as was discussed previously.

Whether there are usage errors is of no consequence. The important question is would changing it reduce the number of errors?

I’d argue no. The current system is pretty simple, and any edge cases are the same as would be in any other system. For a majority of cases remembering the rules “==: equals, ===: is identical to” will suffice.

As a side note, if you are comparing floating point numbers, probably using isapprox is better.

5 Likes

that was a bug of unique, the behavior you’re seeing here, as explained, is specified by IEEE standard, any language that uses IEEE floating-point number has the same behavior (or they should, if they want to be comlient with IEEE…)

2 Likes

If you really need that behavior to not comply with the standard in your package, define your internal comparing function, such as:

julia> function compare(x,y)
           if isnan(x) && isnan(x)
               return true
           else
               return isequal(x,y)
           end
       end
compare (generic function with 1 method)

julia> compare(NaN,NaN)
true

julia> compare(0.0,0.0)
true

(bad example here, since you could just use isequal everywhere, and that would do it, but if you need any other specific behavior, that’s the idea, if you are expecting to deal with some sort of data that needs a special attention on that side, define your own function to deal with that)

2 Likes

Understood!

Just curious why IEEE would make the decision that way. There must be some advantages they have considered I assume?

1 Like

Check out Branch Cuts for Complex Elementary Functions or Much Ado About Nothing’s Sign Bit by Professor William Kahan. :kahan:

1 Like

It sounds like you are looking for sentinel values, as opposed to actual floating-point data. It’s probably better to deal with sentinels directly rather than to replace them with other sentinels such as 0.00001. As for NaN, the direct approach is isnan. You do indeed have a problem not knowing whether an existing package’s someequalitycheck(x,y) gets tripped up on these things, but there are two issues here. First, using any equality check with float data is slightly suspect, and could potentially be re-thought (e.g., isapprox). Second, assuming equality is a valid thing done by the function, don’t assume it can be fed NaN unless designed expressly to do what you want.

Yes, the IEEE standard has advantages, and this report by Bill Kahan provides some context. One can debate the merits of IEEE, but I really think your first issue is dealing with sentinels appropriately.

6 Likes