I found much of the discussion fascinating.

While there have already been umpteen examples, StefanKarpinski said:

In the particular case of openlibm, writing correct algorithms without the guarantees of IEEE will end up being be less efficient than just turning IEEE mode arithmetic on. If you are in the business of writing mathematical code that gives consistent, predictable, correct answers, IEEE is not the enemy of performance, it is precisely the set of guarantees you need in order to know that you are doing exactly the minimal amount of work needed to get the right answer.

Thought I’d add I recently saw a post somewhere using the trapezoid rule for the integral of sin(x) from 0 to pi. The correct answer is of course -cos(pi) - (-cos(0) ) = 2.

Using 10,000,000 evaluations so it takes a while, a Fortran implementation using Kahan summation:

```
program integral
implicit none
integer, parameter :: n_lim = 10000000
real, parameter :: a = acos( - 1.0 ), &
dx = (a - 0.0) / n_lim
real :: t, sinx, r, cr, yr, tr, cs, ys, ts
integer :: n
sinx = 0.5 * ( sin(0.0) + sin(a) )
r = dx; cs = 0.0; cr = 0.0
do n = 2, n_lim
ys = sin(r) - cs
ts = sinx + ys
cs = (ts - sinx) - ys
sinx = ts
yr = dx - cr
tr = r + yr
cr = (tr - r) - yr
r = tr
end do
sinx = sinx * dx
write(*,*) sinx
end program integral
```

The results, first compiling with IEEE, then -ffast-math, and finally -ffast-math and double precision:

```
$ gfortran-7.2 -O3 -funroll-loops sin_integral.f08 -o int
$ time ./int 2.00000000
real 0m0.073s
user 0m0.069s
sys 0m0.004s
$ gfortran-7.2 -Ofast sin_integral.f08 -o int_fastmath
$ time ./int_fastmath
2.10740471
real 0m0.056s
user 0m0.056s
sys 0m0.000s
$ gfortran-7.2 -Ofast -fdefault-real-8 sin_integral.f08 -o int_fastmath_double
$ time ./int_fastmath_double
2.0000000004135496
real 0m0.147s
user 0m0.147s
sys 0m0.000s
```

The third answer is close enough – but still more error than IEEE, and took over twice as long!

Seems like a clear example of “IEEE is…precisely the set of guarantees you need in order to know that you are doing exactly the minimal amount of work needed to get the right answer.”

The sin(x) from 0 to pi isn’t nearly as badly behaved as many of the example summations, but still features a lot of numbers closer to zero, and numerical integration is a fairly common application.

Also, for fun, running on Julia 0.7:

```
julia> function integral(f, l, u, n = 10_000_000)
dx = (u - l) / n
T = eltype(dx)
dx * (sum(f, linspace(dx, u - dx, n-1)) + (f(T(l)) + f(T(u)))/2)
end
julia> integral(sin, 0f0, π)
1.9999998f0
julia> integral(sin, 0, π)
1.9999997999999437
julia> @benchmark integral(sin, 0f0, π)
BenchmarkTools.Trial:
memory estimate: 112 bytes
allocs estimate: 3
--------------
minimum time: 51.116 ms (0.00% GC)
median time: 51.818 ms (0.00% GC)
mean time: 52.330 ms (0.00% GC)
maximum time: 58.548 ms (0.00% GC)
--------------
samples: 96
evals/sample: 1
julia> @benchmark integral(sin, 0, π)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 80.174 ms (0.00% GC)
median time: 80.943 ms (0.00% GC)
mean time: 81.045 ms (0.00% GC)
maximum time: 83.025 ms (0.00% GC)
--------------
samples: 62
evals/sample: 1
```

I’m not exactly sure why it allocates when calling with a Float32 argument. Unfortunately, sum_kbn doesn’t have a method that takes a function and an iterable as arguments.

But simply using the Base sum and linspace functions give great performance and fairly solid accuracy.

To be a little more fair (to be totally fair, would probably have to time them in the same way – maybe there is a 20 ms overhead on time and launching the executable?), implementing the exact same algorithm:

```
julia> function integral_kbn2(f, l, u, n = 10_000_000)
dx = (u - l) / n
T = eltype(dx)
sinx = (f(T(l)) + f(T(u))) / 2
r = dx; cs = zero(T); cr = zero(T)
for n ∈ 2:n_lim
ys = f(r) - cs
ts = sinx + ys
cs = (ts - sinx) - ys
sinx = ts
yr = dx - cr
tr = r + yr
cr = (tr - r) - yr
r = tr
end
sinx * dx
end
integral_kbn2 (generic function with 2 methods)
julia> integral_kbn2(sin, 0f0, π)
2.0f0
julia> integral_kbn2(sin, 0, π)
1.9999999999999836
julia> @benchmark integral_kbn2(sin, 0f0, π)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 54.491 ms (0.00% GC)
median time: 55.370 ms (0.00% GC)
mean time: 55.433 ms (0.00% GC)
maximum time: 59.978 ms (0.00% GC)
--------------
samples: 91
evals/sample: 1
julia> @benchmark integral_kbn2(sin, 0, π)
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 128.128 ms (0.00% GC)
median time: 129.914 ms (0.00% GC)
mean time: 131.293 ms (0.00% GC)
maximum time: 138.969 ms (0.00% GC)
--------------
samples: 39
evals/sample: 1
```

As a disclaimer, I know Julia much better than Fortran, so my Fortran implementation and/or execution may be relatively lacking.