Comparing floating point numbers in an intuitive way fails within julia


#1

Comparing floating point numbers I run into troubles when “0.0”, “-0.0” and “NaN” are possible values. To my mind there is no reasonable implementation of “isequal(x,y)” and “x==y”.

A reasonable implementation of comparison operators for (floating point) numbers would give the following response:

NaN == NaN —> true (but julia gives false)
0.0 == -0.0 —> true
isequal(NaN, NaN) —> true
isequal(0.0, -0.0) —> true (but julia gives false)

So neither “x==y” nor “isqual(x,y)” provides the desired boolean output. And they behave differently, what is the benefit for that? What is the use case for “NaN==NaN —> false” and for “isequal(-0.0,0.0) —> false”?

As an experiend programmer I can hardly accept that I have to spend time on such a trivial issue. You can only solve such a problem by inspecting the documentation. So I have to read documentation to be able to properly use “==” or “isequal()” operators/functions? Are the developers of julia aware of that? Are they serious about that? I bet that lots of people trip over that issue and do not understand the way these operators are implemented.

Of course, I can implement my own comparison function like:

my_isequal(x,y) = isequal(x,y) || x==y

but is this really the Julian-way of solving it? I question myself why I have to think about such kind of problems when I use a modern programming language like julia.

Regards,
Martin


#2

The semantics of floating point comparison is standardized (see eg https://en.wikipedia.org/wiki/NaN#Comparison_with_NaN), shared by most programming language and implemented in hardware, so == behaving any differently (like what you’d suggest) would be massively disruptive. It seems from the docs the rationale for isequal is

isequal(x,y) must imply that hash(x) == hash(y).

which is not possible with == from IEEE semantics, nor with yours, so I’d be a bit more careful about calling it “a reasonable implementation”. Maybe it could be documented more clearly that most users should not use isequal directly.


#3

The ==/< and isequal/isless orderings serve different purposes. The ==/< operators implement standard and standardized comparison behaviors for use in writing numerical code. This is how NaN and -0.0 behave in C, C++, Fortran, Matlab, Python, Perl, R, Ruby, etc.—so many that I have to wonder what languages you’ve been using that you haven’t encountered this before?

The isequal/isless operators are designed for hashing and sorting. We have isless(-0.0, 0.0) and therefore !isequal(-0.0, 0.0) for two major reasons:

  1. So that the sort order of ±0.0 is deterministic and value-based, with negative zeros sorting before positive zeros;
  2. Because it’s fairly common for numerical functions (with branch cuts, for example) to treat negative and positive zeros differently, so if you’re memoizing a function with a hash, you want 0.0 and -0.0 to hash differently so that you can save different answers for them.

Of course, there are situations where this is inconvenient, annoying or surprising. There was a long discussion of this point in this issue in response to an earlier discourse thread. You may want to read that discussion and this post from the original thread in which I outlined much of the reasoning behind the current behaviors.

Questions and feedback are welcomed, but you may want to keep a few things in mind:

  1. It’s safe to assume that every numerical behavior in Julia has been extensively discussed and debated by experts over the 8 years of development leading to Julia 1.0.
  2. Behaviors in 1.0 and are not eligible to change until Julia 2.0, so while discussion of improvements is encouraged, the timeline for changing them is not immediate.
  3. The rhetorical style of your first post makes it a bit difficult to respond neutrally; a more genuinely inquisitive rather than dramatically outraged approach is likely to come off better.

If you find the built-in == unsuitable, you can define your own == operator and use the standard == syntax. Readers of your code may find this confusing, however, so while it’s possible, it may not be advisable.

Thanks for the inquiry!


#4

I was also perplexed by the need of two different zeros, 0.0 and -0.0, until I came across a paper from a collection by prof. Kahan (http://people.eecs.berkeley.edu/~wkahan/). I cannot find that specific paper now but I know it is from this site.


#5

It’s also worth noting that it’s rare to compare floating point values with ==, you generally want to see if two numbers are within some tolerance, as two numbers that are computed in a way that should be mathematically equivalent on paper might end up with slightly different floating point results.


#6

(See the isapprox function and the operator.)


#7

An example of signed zeros that I like is branch cuts. For example:

julia> sqrt(-1+0.0im)
0.0 + 1.0im

julia> sqrt(-1-0.0im)
0.0 - 1.0im

I found Kahan’s paper here: https://people.freebsd.org/~das/kahan86branch.pdf