When "if (a<b) x=1/(a-b)" divides by zero


#1

This came up in https://github.com/JuliaLang/julia/issues/24568, specifically this comment https://github.com/JuliaLang/julia/issues/24568#issuecomment-345004303. The question was, just looking at a code

float X, XMin, Y;
if (X < XMin)
{
    Y= 1 / (XMin - X);
    XMin= X;
}

can you see how this code might possibly raise an error under -ffast-math?

Here is a simple example to test this behavior:

program divison_test
implicit none
integer, parameter :: dp = kind(0.d0)
real(dp) :: x, y
x = 5e-324_dp
y = 1 / x
print *, x, y, x > 0
end

This gives:

$ gfortran b.f90 && ./a.out
    4.9406564584124654E-324                  Infinity T
$ gfortran -ffast-math b.f90 && ./a.out
   0.0000000000000000                       Infinity F
$ gfortran -O3 -ffast-math b.f90 && ./a.out
   0.0000000000000000                       Infinity T

In the first case, the condition x>0 evaluated as True, so x is a positive number, but 1/x still gave infinity. Btw, this is already a problem — if my computational code gets an infinity like this, and I need to use y later, it’s pretty much all over, the code will crash. In other words, I want that if statement to catch this, but it would fail to catch it above.

In the second case, the condition x > 0 evaluated as false, so that’s correct.

However, in the third case, the condition evaluated as true, yet x is zero, and I got a division by zero basically.

A robust solution around these things is to write the code like this:

program divison_test
implicit none
integer, parameter :: dp = kind(0.d0)
real(dp) :: x, y
x = 5e-324_dp
y = 1 / x
print *, x, y, abs(x) > 1e-12_dp
end

Then no matter how it is compiled, the x>0 condition always evaluates as false, and thus I never get any division by zero or infinity if I put this into an if statement:

$ gfortran -O3 -ffast-math d.f90  && ./a.out
   0.0000000000000000                       Infinity F
$ gfortran -ffast-math d.f90  && ./a.out
   0.0000000000000000                       Infinity F
$ gfortran  d.f90  && ./a.out
   4.9406564584124654E-324                  Infinity F

#2

I’m not quite sure what the question is, but the following Julia code shows which is the largest floating-point number x_max such that 1/x_max gives Inf:


julia> z = realmax()
1.7976931348623157e308

julia> z = prevfloat(Inf)   # equivalent
1.7976931348623157e308

julia> max_x = 1 / z
5.562684646268003e-309

julia> 1 / max_x
Inf

julia> 1 / (nextfloat(max_x))
1.7976931348623143e308

EDIT: Actually I was surprised that 1 / max_x itself gave Inf. This is to do with rounding:

julia> setrounding(Float64, RoundDown) do
           1 / max_x
       end
1.7976931348623157e308

julia> setrounding(Float64, RoundUp) do
           1 / max_x
       end
Inf

#3

Thanks for the answer. I modified my comment to put question in.

Essentially the task was whether it is possible to spot what the problem can be with such a code when -ffast-math is used, and also how to rewrite it, so that it does not give errors even if -ffast-math is used.

The Fortran code above reproduces the problem that the condition evaluates as True, yet because denormals are flushed to 0 with -ffast-math, it will divide by zero. But the compiler was clever that if I put the if statement with the x>0 condition in, it will just work without error. But perhaps with some older compilers it will give an error.

And then your code in Julia shows what the condition must be on x, so that one does not get an infinity.


#4

@mbauman I figured out an exact test problem for this behavior:

program divison_test
implicit none
integer, parameter :: dp = kind(0.d0)
real(dp) :: x, y, z
x = 2.2251000000000007e-308_dp
y = 2.2251e-308_dp
print *, x, y, x > y
if (x > y) then
    z = 1 / (x-y)
else
    z = -1
end if
print *, z
end

And it behaves like this:

$ gfortran -ffpe-trap=zero b.f90 && ./a.out
   2.2251000000000007E-308   2.2251000000000002E-308 T
                  Infinity
$ gfortran -ffast-math -ffpe-trap=zero b.f90 && ./a.out
   2.2251000000000007E-308   2.2251000000000002E-308 T

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7fda0c698ef7 in ???
#1  0x7fda0c69812d in ???
#2  0x7fda0c2e84af in ???
#3  0x400964 in ???
#4  0x400a1e in ???
#5  0x7fda0c2d382f in ???
#6  0x400798 in ???
#7  0xffffffffffffffff in ???
Floating point exception (core dumped)

In the first case the compiler is using an IEEE semantics and x-y becomes a denormal number, and so 1/(x-y) is well defined under IEEE, even if it is infinity.

In the second case, the compiler is not using an IEEE semantics, and x > y is still true, but x-y gets flushed to zero, and then you get a division by zero, as evidenced by the division zero exception that I trapped.


#5

I really appreciate your diligence here, but my question was far more rhetorical than I think you interpreted it.

Many of the other examples of -ffast-math's changes to program behavior in that thread previously to my post had been of the form where -ffast-math would wrongly assume that floating point numbers satisfied “basic math” rules like associativity of + and similar re-arrangements. My read of your stance at that point in the conversation had been that you should not rely upon such “implementation details” of the IEEE behavior — you simply code against basic math rules and assume there’s some precision loss.

This example is slightly different. IEEE floating point numbers actually do obey the “basic arithmetic” rule that a > b implies that a - b > 0. I’m really glad you found that example, because it shows that -ffast-math breaks it in two ways. As you’ve demonstrated, flushing subnormals to zero is one way this gets broken during the subtraction. But this can also be broken if the comparison is done with the results of a CPU operation that’s performed at 80-bits precision (since it may not rounding back down to the IEEE format first). Example in C:

#include <math.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    double a = atof(argv[1]);
    if (0.5 + a > 1.0)
        printf("%.20f > 1.0: true\n", 0.5 + a);
    else
        printf("%.20f > 1.0: false\n", 0.5 + a);
    return 0;
}
$ gcc test.c -o test && ./test 0.5000000000000001
1.00000000000000000000 > 1.0: false

$ gcc -ffast-math test.c -o test && ./test 0.5000000000000001
1.00000000000000000000 > 1.0: true

So: what does a+b > c actually mean? Does it perform both the addition and comparison at higher precision than you might expect? Or does it perform the addition at a lower precision by flushing a subnormal to zero first and then doing the comparison on that? This is a perfect example of how there’s not a consistent rule for -ffast-math — it’s just breaking what it can in order to do whatever happens to be fast on your current architecture.

Note that b might indeed be a threshold value that you’re using as a “robust” solution in anticipating some “loss” of precision. If you happen to test against that threshold differently in two different places in your program and expect them to agree, you’re in for trouble.

I hold that in general it’s simply intractable to “defensively” code against the transformations that -ffast-math may or may not perform. If a sufficiently advanced compiler is indistinguishable from an adversary, then giving the compiler access to -ffast-math is gifting that enemy nukes. That doesn’t mean you can’t use it! You just have to test enough to gain confidence that no bombs go off with your compiler on your system.