Consider things like Kahan summation algorithm - Wikipedia
Translating their algorithm from pseudo-code into Julia:
julia> using AccurateArithmetic, StableRNGs
julia> function KahanSum(input)
sum = zero(eltype(input)) # Prepare the accumulator.
c = zero(eltype(input)) # A running compensation for lost low-order bits.
for i = eachindex(input)
y = input[i] - c # c is zero the first time around.
t = sum + y # Alas, sum is big, y small, so low-order digits of y are lost.
c = (t - sum) - y # (t - sum) cancels the high-order part of y; subtracting y recovers negative (low part of y)
sum = t # Algebraically, c should always be zero. Beware overly-aggressive optimizing compilers!
end # Next time around, the lost low part will be added to y in a fresh attempt.
return sum
end
KahanSum (generic function with 1 method)
julia> function KahanSumFast(input)
sum = zero(eltype(input)) # Prepare the accumulator.
c = zero(eltype(input)) # A running compensation for lost low-order bits.
@fastmath for i = eachindex(input)
y = input[i] - c # c is zero the first time around.
t = sum + y # Alas, sum is big, y small, so low-order digits of y are lost.
c = (t - sum) - y # (t - sum) cancels the high-order part of y; subtracting y recovers negative (low part of y)
sum = t # Algebraically, c should always be zero. Beware overly-aggressive optimizing compilers!
end # Next time around, the lost low part will be added to y in a fresh attempt.
return sum
end
KahanSumFast (generic function with 1 method)
julia> rng = StableRNG(1); x = 5000 |> N->randn(rng, N) .* exp.(10 .* randn(rng, N)) |> x->[x;-x;1.0] |> x->x[sortperm(rand(rng,length(x)))];
julia> foldl(+, x)
4.554186716906959
julia> sum(x)
1.875
julia> sum(big, x)
1.0
julia> KahanSum(x)
1.0893429687695757
julia> KahanSumFast(x)
0.985827341906959
julia> sum_kbn(x)
1.0
julia> sum_oro(x)
1.0
julia> function sum_fast(input)
sum = zero(eltype(input))
@fastmath for i = eachindex(input)
@inbounds sum += input[i] end return sum endsum_fast (generic function with 1 method)julia> sum_fast(x)0.985827341906959
If floating point arithmetic were associative, we could transform
y = input[i] - c
t = sum + y
c = (t - sum) - y
sum = t
into
y = input[i] - c
t = sum + y
c = (sum + y - sum) - y = 0
sum = t
into
sum += input[i]
That is, fast math essentially lets the compiler delete the code that tries to track and compensate for accumulating floating point error.
However, because fast math allows the compiler to change the result, it also applies a transform – multiple separate accumulators – that massively decreases the floating point error. Hence KahanSumFast
– while equivalent to the naive @fastmath
loop – is way more accurate (and faster) than foldl(+, x)
, and not actually much worse than kahan summation for the example above.
AccurateArithmetic.jl is much better than the naive KahanSummation, combining the compensation of error with the separate parallel accumulators (making it of course also much faster than the naive Kahan Summation).
julia> @btime KahanSum($x)
45.747 μs (0 allocations: 0 bytes)
1.0893429687695757
julia> @btime KahanSumFast($x) # not actually a Kahan sum at all!
511.698 ns (0 allocations: 0 bytes)
0.985827341906959
julia> @btime sum_kbn($x) # kahan sum with parallel accumulators
1.722 μs (0 allocations: 0 bytes)
1.0
julia> @btime sum_oro($x) # Ogita–Rump–Oishi with parallel accumulators
1.505 μs (0 allocations: 0 bytes)
1.0
Thanks to the parallel accumulators, AccurateArithmetic
is only around 3x slower than the straight @fastmath
sum, instead of 90x slower like the naive kahan summation described in the Wikipedia article.
Anyway, to answer your question:
There are error-free transforms like in this article. @fastmath
can undo them, like I showed. In this particular example, it isn’t bad, because @fastmath
also allows applying another transform that dramatically improves both accuracy and speed, which isn’t allowed without it (@simd
also allows this).
But in other cases, such as implementing functions such as exp
, we lose a lot of accuracy with @fastmath
, because it only undoes the compensation!
In the case of languages like C, C++, Fortran, etc, a global @fastmath
isn’t as bad, because functions like exp
are replaced with only slightly less accurate implementations that have already been compiled with their error compensations intact (they just have less of it than the more accurate variants). That is, the “fast” exp you’re calling from C with fast math was itself not compiled under fast math.
Similarly, in Julia, @fastmath exp(x)
will call a faster and less accurate exp
implementation, but that implementation wasn’t itself compiled under fast math.
Implementing functions like these is a very delicate process. The authors of these libraries (like @Oscar_Smith ) are very deliberate about floating point rounding and error, and would thus need strict IEEE adherence for their implementations.