Accurate summation algorithm


Stefan Karpinski posted an interesting challenge in

Try to find a summation algorithm whose results are always between (1-1e-14)*s and (1+1e-14)*s where s is the true sum, regardless of how a -ffast-math compiler decides to implement it. To keep it simple, let’s say it only needs to handle data where the true sum of the absolute values is less than the max finite double (≈ 1.8e308).

It is aimed to be implemented in C or Fortran, and then compiled with -ffast-math.

I am not an expert in summations, so that’s why I am posting it here, perhaps somebody can help. The first task is to figure out an algorithm that can do that, say in Julia. The relevant starting points might be:

It seems that the Kahan algorithm is the most accurate, but even that one can return large errors if the input is ill conditioned. So as a start, one should assume the input let’s say random numbers between 0 and 1, to simplify things.

The second problem is that the error probably depends on the length of the array to be summed.

Maybe to specify the domain a bit more, one should specify what the input is. Perhaps one could use as an input some mathematical sum, say for a pi, e.g., from the Julia benchmarks. And then see how the accuracy of the final answer depends on the way things are summed.

That is relevant to the thread in, as -ffast-math will make it unpredictable which order the array was summed. So an interesting project would be to see how the final accuracy changes for the different summing algorithms on the pi sum benchmark.

@fastmath switched off in Julia 0.7-Dev, deliberately?

I think I might have found a nice problem for this summation challenge.

Page 7 of this document: lists an interesting example. I have implemented it, and indeed it behaves as Kahan describes, which is very interesting (the only difference is that the -ffast-math code actually runs faster, but the accuracy is still bad):

program kahan
implicit none
integer, parameter :: dp = kind(0.d0)
real(dp) :: s, os, comp
integer :: k
s = 0; os = -1; comp = 0; k = 0;
do while (s > os)
    k = k + 1; os = s; comp = comp + term(k)
    s = comp + os
    comp = (os-s)+comp
end do
print *, k, s, term(k)
s = s + (tail(k) + comp)
print *, s


    real(dp) function term(k) result(r)
    integer, intent(in) :: k
    r = 3465/(real(k,dp)**2-1/16._dp) + 3465/((k+0.5_dp)**2-1/16._dp)
    end function

    real(dp) function tail(k) result(r)
    integer, intent(in) :: k
    r = 3465/(k+0.5_dp) + 3465/(k+1._dp)
    end function


And run it like this:

$ gfortran -O3 -march=native -funroll-loops kahan.f90 && time ./a.out
    61728404   9239.9998877340167        1.8187086585703921E-012

real    0m0.263s
user    0m0.260s
sys     0m0.000s
$ gfortran -O3 -march=native -ffast-math -funroll-loops kahan.f90 && time ./a.out
    87290410   9239.9999320850657        9.0949468492785197E-013

real    0m0.204s
user    0m0.200s
sys     0m0.000s

Perhaps something like this can be used as a benchmark. And you have to require some reasonable accuracy, say, 12 significant digits. Then a naive algorithm with -ffast-math will simply fail.

I need to think more about this particular example.


I analyzed the example to see what is going on. The issue is that we are summing an array of terms, the first one being ~5000, and the last one being ~1e-12. The total sum is converging towards 9240. As such, the array is badly conditioned, and as we keep summing, eventually the small terms do not contribute to the sum, unless we use a trick like the Kahan summation, but -ffast-math optimized that away. Now when we understand what is going on, the solution is simple — improve the conditioning of the array, for example by first summing the first few large terms (I ended up summing the first 600), and then sum the other 80 million up to a certain tolerance (I used the same tolerance as was implicitly done in the original code). Now everything works and the results match, with or without -ffast-math. The code also got faster.

program kahan
implicit none
integer, parameter :: dp = kind(0.d0)
real(dp) :: sum_, sum1, sum2, t
integer :: k, k_split
sum1 = 0
k_split = 600
do k = 1, k_split
    sum1 = sum1 + term(k)
end do
sum2 = 0; k = k_split + 1; t = term(k)
do while (t > 1.8187e-12_dp)
    sum2 = sum2 + t
    k = k + 1; t = term(k)
end do
print *, k, sum1+sum2, t
sum_ = sum1 + sum2 + tail(k)
print *, sum_


    real(dp) function term(k) result(r)
    integer, intent(in) :: k
    r = 3465/(real(k,dp)**2-1/16._dp) + 3465/((k+0.5_dp)**2-1/16._dp)
    end function

    real(dp) function tail(k) result(r)
    integer, intent(in) :: k
    r = 3465/(k+0.5_dp) + 3465/(k+1._dp)
    end function


And when you run it:

$ gfortran -O3 -march=native -funroll-loops kahan.f90 && time ./a.out
    61728551   9239.9998877342841        1.8186999964570246E-012

real    0m0.126s
user    0m0.124s
sys     0m0.000s
$ gfortran -O3 -march=native -ffast-math -funroll-loops kahan.f90 && time ./a.out
    61728551   9239.9998877342823        1.8186999964570246E-012

real    0m0.126s
user    0m0.124s
sys     0m0.000s

Everything seems to work now.


THere is a whole zoo of generalisations of and alternatives to Kahans algorithm.


Oh hi, I’m your friendly compiler update. I can now happily unroll loops and inline functions.

# ...
sum1 = sum1 + 3465/(real(1,dp)**2-1/16._dp) + 3465/((1+0.5_dp)**2-1/16._dp)
sum1 = sum1 + 3465/(real(2,dp)**2-1/16._dp) + 3465/((2+0.5_dp)**2-1/16._dp)
sum1 = sum1 + 3465/(real(3,dp)**2-1/16._dp) + 3465/((3+0.5_dp)**2-1/16._dp)
# ...
sum1 = sum1 + 3465/(real(600,dp)**2-1/16._dp) + 3465/((600+0.5_dp)**2-1/16._dp)
do while (t > 1.8187e-12_dp)
    sum2 = sum2 + t
    k = 600 + 1; t = 3465/(real(k,dp)**2-1/16._dp) + 3465/((k+0.5_dp)**2-1/16._dp)
end do
print *, sum1 + sum2 

Oh hey, you know, I can figure out the bounds on that do while guy, too. Let’s unroll that one, too, for good measure. And, hey, did you say I can perform any kind of mathematics I want with -ffast-math? I know all these answers! You’re just adding everything together in any case… I can do that for you! Let’s just re-associate things to make it a bit easier on myself. I’ll just add things up sequentially instead. That’s faster!


You do know that both of these algorithms are already in the Julia standard library, right?

sum_kbn is Kahan summation, and sum uses pairwise summation for any sufficiently large array.

And there is a well known paper by Higham analyzing these and other summation variants (e.g. sorting before summing). (But the analysis depends on the order of the operations, which a “fast-math” compiler might alter, of course.)


Also, if you need the exact answer, then what you’re looking for is called a superaccumulator.


Thanks for the reference. I didn’t know about these methods


A point I’ve made several times previously is the following:

  • Mathematically, since + is associative and commutative, all correct summation algorithms are algebraically equivalent to each other, in particular they are all equivalent to left-to-right summation.

  • The -ffast-math option gives the compiler license to transform any algorithm into any algebraically equivalent one in the mathematical sense.

  • Therefore -ffast-math gives the compiler license to transform any summation algorithm into any other summation algorithm, including left-to-right summation.

That’s why there’s no fundamentally useful answer to this question: the only thing you can find out is how a particular compiler happens to optimize your code. A different compiler in -ffast-math mode or a different version of your own compiler – or the exact same compiler targeting a different architecture – can always make the situation different and potentially much worse. The example above happens to outsmart one particular compiler, but as @mbauman’s post shows, a slightly smarter compiler could straightforwardly overcome that obstacle and make this exact same code fail. Or put another way, there are no reliable summation algorithms in -ffast-math mode.

The same argument applies to superaccumulators in principle, but they typically operate on floating-point values reinterpreted as integers which -ffast-math gives no additional license to optimize, so they should be safe enough. They’re much slower than approximate summation algorithms, however.


The Higham summation paper:


I found much of the discussion fascinating.

While there have already been umpteen examples, StefanKarpinski said:

In the particular case of openlibm, writing correct algorithms without the guarantees of IEEE will end up being be less efficient than just turning IEEE mode arithmetic on. If you are in the business of writing mathematical code that gives consistent, predictable, correct answers, IEEE is not the enemy of performance, it is precisely the set of guarantees you need in order to know that you are doing exactly the minimal amount of work needed to get the right answer.

Thought I’d add I recently saw a post somewhere using the trapezoid rule for the integral of sin(x) from 0 to pi. The correct answer is of course -cos(pi) - (-cos(0) ) = 2.
Using 10,000,000 evaluations so it takes a while, a Fortran implementation using Kahan summation:

program integral
implicit none

integer, parameter                  :: n_lim = 10000000
real, parameter                     ::  a = acos( - 1.0 ), &
                                        dx = (a - 0.0) / n_lim
real                                :: t, sinx, r, cr, yr, tr, cs, ys, ts
integer                             :: n

sinx = 0.5 * ( sin(0.0) + sin(a) )
r = dx; cs = 0.0; cr = 0.0
do n = 2, n_lim
    ys = sin(r) - cs
    ts = sinx + ys
    cs = (ts - sinx) - ys
    sinx = ts

    yr = dx - cr
    tr = r + yr
    cr = (tr - r) - yr
    r = tr
end do

sinx = sinx * dx

write(*,*) sinx

end program integral

The results, first compiling with IEEE, then -ffast-math, and finally -ffast-math and double precision:

$ gfortran-7.2 -O3 -funroll-loops sin_integral.f08 -o int
$ time ./int   2.00000000    

real	0m0.073s
user	0m0.069s
sys	0m0.004s
$ gfortran-7.2 -Ofast sin_integral.f08 -o int_fastmath
$ time ./int_fastmath

real	0m0.056s
user	0m0.056s
sys	0m0.000s
$ gfortran-7.2 -Ofast -fdefault-real-8 sin_integral.f08 -o int_fastmath_double
$ time ./int_fastmath_double

real	0m0.147s
user	0m0.147s
sys	0m0.000s

The third answer is close enough – but still more error than IEEE, and took over twice as long!
Seems like a clear example of “IEEE is…precisely the set of guarantees you need in order to know that you are doing exactly the minimal amount of work needed to get the right answer.”

The sin(x) from 0 to pi isn’t nearly as badly behaved as many of the example summations, but still features a lot of numbers closer to zero, and numerical integration is a fairly common application.

Also, for fun, running on Julia 0.7:

julia> function integral(f, l, u, n = 10_000_000)
           dx = (u - l) / n
           T = eltype(dx)
           dx * (sum(f, linspace(dx, u - dx, n-1)) + (f(T(l)) + f(T(u)))/2)

julia> integral(sin, 0f0, π)

julia> integral(sin, 0, π)

julia> @benchmark integral(sin, 0f0, π)
  memory estimate:  112 bytes
  allocs estimate:  3
  minimum time:     51.116 ms (0.00% GC)
  median time:      51.818 ms (0.00% GC)
  mean time:        52.330 ms (0.00% GC)
  maximum time:     58.548 ms (0.00% GC)
  samples:          96
  evals/sample:     1

julia> @benchmark integral(sin, 0, π)
  memory estimate:  0 bytes
  allocs estimate:  0
  minimum time:     80.174 ms (0.00% GC)
  median time:      80.943 ms (0.00% GC)
  mean time:        81.045 ms (0.00% GC)
  maximum time:     83.025 ms (0.00% GC)
  samples:          62
  evals/sample:     1

I’m not exactly sure why it allocates when calling with a Float32 argument. Unfortunately, sum_kbn doesn’t have a method that takes a function and an iterable as arguments.
But simply using the Base sum and linspace functions give great performance and fairly solid accuracy.

To be a little more fair (to be totally fair, would probably have to time them in the same way – maybe there is a 20 ms overhead on time and launching the executable?), implementing the exact same algorithm:

julia> function integral_kbn2(f, l, u, n = 10_000_000)
           dx = (u - l) / n
           T = eltype(dx)
           sinx = (f(T(l)) + f(T(u))) / 2
           r = dx; cs = zero(T); cr = zero(T)
           for n ∈ 2:n_lim
               ys = f(r) - cs
               ts = sinx + ys
               cs = (ts - sinx) - ys
               sinx = ts
               yr = dx - cr
               tr = r + yr
               cr = (tr - r) - yr
               r = tr
           sinx * dx
integral_kbn2 (generic function with 2 methods)

julia> integral_kbn2(sin, 0f0, π)

julia> integral_kbn2(sin, 0, π)

julia> @benchmark integral_kbn2(sin, 0f0, π)
  memory estimate:  0 bytes
  allocs estimate:  0
  minimum time:     54.491 ms (0.00% GC)
  median time:      55.370 ms (0.00% GC)
  mean time:        55.433 ms (0.00% GC)
  maximum time:     59.978 ms (0.00% GC)
  samples:          91
  evals/sample:     1

julia> @benchmark integral_kbn2(sin, 0, π)
  memory estimate:  0 bytes
  allocs estimate:  0
  minimum time:     128.128 ms (0.00% GC)
  median time:      129.914 ms (0.00% GC)
  mean time:        131.293 ms (0.00% GC)
  maximum time:     138.969 ms (0.00% GC)
  samples:          39
  evals/sample:     1

As a disclaimer, I know Julia much better than Fortran, so my Fortran implementation and/or execution may be relatively lacking.


I wrote down the exact (consistent and coherent) rules for the Fortran school:, and I have taken every single example that anybody ever posted on either this thread or the github thread, and I analyzed it from both the IEEE and Fortran perspective, and shown which of the rules it breaks and then how to rewrite it so that it follows all the Fortran school rules and then it recovers the IEEE accuracy as expected.

My document has all the details for each example, and also discusses the rules and philosophy at length, so I don’t need to copy it here. I will just point out a few things:

  • @mbauman and @StefanKarpinski 's argument is that the compiler can fuse the loops and thus break the conditioning and the result will lose accuracy. That is a correct argument assuming the compiler can fuse the loops, but that is against the rule 2. The Fortran rules only allow the compiler to rewrite things locally, not globally.

  • @Elrod see the notes for the analysis of your problem. I provided a solution that delivers an equivalent accuracy with -ffast-math.

  • The overall idea is that when you write an algorithm, the compiler can assume that things are locally well conditioned, and thus can rewrite things at will. The compiler can’t assume globally well conditioned code, since that would essentially mean that the code is not imperative, but functional like SymPy or Mathematica, but that only works for infinite precision. For finite precision we need to tell the compiler what the global order of operations is, since global reordering can change conditioning. Local reordering however does not change the conditioning, assuming the expressions are already well conditioned. The document has numerous examples of this particular point.
    To put it another way: if a code is well conditioned, as written, then local transformations can’t break the conditioning, but global can. The compiler is not allowed to do transformations that break the conditioning, and thus only local transformations are allowed (see the document for a precise formulation, e.g., the compiler can’t introduce variables that it doesn’t know if they are well conditioned…).

  • @StefanKarpinski’s point was also that there cannot be any set of consistent and coherent rules for the Fortran school. Stefan, please show how these rules are not consistent by finding an example which follows the rules as I wrote them, but still breaks with -ffast-math.

  • The final point is what about real compilers, do they actually follow these rules? Note that this is a separate question to whether the rules are consistent. The answer is that today’s compilers indeed seem to follow these rules, and thus it is safe and accurate to use -ffast-math, as long as the user’s code follows the rules. The main rule not to be broken is the global transformation/reordering — if any compiler decided to break that in the future, then hopefully they would provide an option to turn the global transformations off.

  • I think the main difference between let’s say the Kahan summation and the Fortran school of doing things is that the Kahan summation works for a wide range of pretty badly conditioned input (although it still breaks for very badly conditioned input, as shown in the github thread). It might be that the Kahan summation is the best that you can do, without knowing anything about the input. The Fortran school works in a way that you have to know what problem you are solving and what the input is and then arrange the algorithm so that it works for the given application.

The ball is now in your court to find an example that follows the Fortran rules and breaks (=loses accuracy). Or an example which you think cannot be rewritten to follow the Fortran rules and still solve the same problem. Or any other example which you think will show that the Fortran school rules break down on.

P.S. I wrote down these rules before @mbauman, @StefanKarpinski and @Elrod commented in this thread, but I have been consulting the rules with multiple people (including @mbauman) before posting here, so that I do not waste your time if something is trivially wrong with them.


I don’t think it’s particularly relevant to make up your own rules about Fortran compilers being allowed to do “well conditioned” transformations that are “local” but “don’t introduce new variables”. Your rules aren’t part of the specification of any language or compiler, so they are irrelevant when discussing what something like -ffast-math may or may not do (including in future compiler versions). Feel free to implement a new -fcertik-math pass for LLVM, though, with a formal specification of what it can do, and then we can examine whether it is something Julia should enable!

For amusement, though, you might consider whether your “Fortran rules” allow the compiler to transform c/(b+sqrt(b^2-c)) into b-sqrt(b^2-c) to save a division? The two expressions are algebraically equivalent in exact arithmetic, and don’t involve “introducing other variables” (whatever that means). But one of these two expressions is vastly more accurate than the other when c << b^2.

By the way, conditioning (condition numbers) in numerical analysis is a property of mathematical functions for given inputs, not of algorithms/code — the whole point of condition numbers is to separate analysis of the function being computed from analysis of particular floating-point algorithms. So, the condition numbers of c/(b+sqrt(b^2-c)) and b-sqrt(b^2-c) are identical, for example. Note also that an algorithm can be numerically unstable even if the function being computed is well-conditioned, and an unstable algorithm is often algebraically equivalent to a stable algorithm (e.g. classical vs. modified Gram–Schmidt).


@stevengj good point about the terminology, I shouldn’t be so loose with it.

Conditioning ( is a property of the problem, not the numerical method. I think what I meant by “conditioning” in the notes is actually called numerical stability, perhaps backward stability. It seems you need both well conditioning and stability to get correct answers.

For c << b^2, the expression b-sqrt(b^2-c) breaks the rule number 1, which explicitly prohibits subtracting two close numbers if the loss of accuracy is more than what is needed for the given application. So I assume using the correct terminology, it means the expression is not numerically stable. Please correct me if I am wrong.

On the other hand, the expression c/(b+sqrt(b^2-c)) for c << b^2 does not seem to break any rule. So it is numerically stable.

You are correct that for c << b^2 they are both mathematically equal to c/(2*b), and all three expressions have the same condition number, and it seems it is well conditioned say for c=1e-10 and b = 1e3.

Note that generally for such c and b, an iterative method like the Newton method is generally more robust than using these formulas in my experience. But we are talking about the principle here, not a particular application.

Another common problem with quadratic equations is if b^2 ~ c, see e.g. Table 2 in for an example. There the system itself seems badly conditioned, so no matter what numerical method I use, I’ll have problems with getting the correct answer. I think an iterative method will not help either.

The Gram-Schmidt example is similar I think.

Essentially your main objection is how can the compiler know not to change x to (x+a)-a. This is answered in the notes, it can’t introduce a new variable “a”, that can make the expression “ill conditioned”, and I should have used the term “unstable” I think, since the conditioning of x and a+x-a is the same.

You are right this is not about introducing new variables. Take x and (x+0x1.8p52) - 0x1.8p52. We didn’t introduce any new variables, but the second expression rounds x to the nearest integer in IEEE double precision arithmetics. Both expressions have the same conditioning, but the first expression is stable, while the second expression is unstable (since in this case the correct answer we want is x, but we got round(x) instead).

Finally, to transform the stable expression c/(b+sqrt(b^2-c)) into the unstable expression b-sqrt(b^2-c), one must multiply both nominator and denominator by b-sqrt(b^2-c), but that factor suffers from numerical cancellation, so this is against the rule number 1. Essentially, when the compiler does the transformation, it can’t introduce something that screws up stability. Here is a list of transformations that gcc does:, and none of them, as far as I know, converts a stable expression into an unstable one.

Here is another good page discussing these issues with several “counter examples”: There the example 1 breaks the rule number 4, and the fix is to test using eps for x close to 0, and then use rational function approximation to get accurate answers on the interval (-eps, +eps). The example 2 breaks the rule number 2 (numerical cancellation), but I would need to see a particular application to see how to fix it. Example 3 breaks the rule number 4 (if the given application requires such values of ‘x’ and ‘y’ so that the argument balances on the edge of the domain/singularity of arccos, then one must use eps to test for that). The example 5 initially breaks the rule number 4, but then it is fixed according to the rule.

Anyway, thanks for the feedback @stevengj, I will incorporate it into the notes, to make it clearer. If you have another example, please let me know, or if I made a mistake in the analysis, please let me know too.

It seems the rule is: as a developer, I solve a well conditioned problem. The particular algorithm that I implemented is stable, and the compiler cannot take a stable expression and turn it into an unstable expression.


Your “rule 1” is very general and not well-defined. Whenever an algorithm misbehaves because of floating point issues, one can always find a violation of your “rule 1”. Because of this, I am not sure it is very useful in practice; one can translate it as “use appropriate algorithms”, but no one ever uses inappropriate ones on purpose.

Unfortunately, I don’t think that the numerical issues of floating point in practice are simple enough to be captured in a few simple “do this, don’t do that” style rules.


@Tamas_Papp I think it can be well defined. Using J. Demmel’s notes that I posted above, the error of the algorithm obeys:

    error =  alg(x) - f(x) ~ f(x+e) - f(x) ~ f(x) + e*f'(x) - f(x) = e*f'(x)

Where f'(x) is conditioning, which is assumed to be well conditioned, and e is the backward error, which is the thing that can misbehave, but as a developer you write the code in away so that e is small.

The question in this discussion is can the compiler optimization take e and make it big? And it is easy to analyze a particular transformation. The rule is that it must be stable. So take

f(x) = x
alg(x) = (x+0x1.8p52) - 0x1.8p52 = round(x)

as a particular example. Is this stable, i.e., is it true that alg(x) ~ f(x+e) for small e? It’s not, for example for x = 0.6, we get f(0.6) = 0.6, but alg(0.6) = 1.0 = f(0.6+0.4), so e = 0.4, which is not small. So the transformation x -> (x+0x1.8p52) - 0x1.8p52 is not allowed per rule 1.

So it seems things can get well defined if needed. In a similar way, we can analyze all transformations from the page, but I don’t see any, that would break the now well defined rule of assuming alg(x) = f(x+e) for small “e”, does the transformation alg(x) -> alg2(x) break the rule alg2(x) = f(x+e) for small e?

If you find a transformation that breaks this, let me know. I would like to know if I am making a mistake in the reasoning.

So the point of the rules I wrote down is to check both user’s code, but also compilers to ensure they actually obey the rules.


I am afraid I lost track of the objective of this discussion. Are you trying to formalize a set of rules different from IEEE that are somehow compiler- and architecture-independent? This might take more work than a ~10 bullet points. Or proposing that Julia implements an alternative set of rules which are similar to --ffast-math in spirit, but formalized by a standard?


Considering that you are just now learning about conditioning and stability, and are still struggling with definitions, perhaps you shouldn’t be so glib about whether the analysis is easy to automate.



Are you trying to formalize a set of rules different from IEEE that are somehow compiler- and architecture-independent? This might take more work than a ~10 bullet points.


Or proposing that Julia implements an alternative set of rules which are similar to --ffast-math in spirit, but formalized by a standard?

No. You do what you think is best for Julia. However, I do ask the Julia community to either:

  • agree that the rules are consistent and coherent


  • find a counter example that cannot be refuted

In the first case I would then like to propose to put the -ffast-math Fortran benchmark as a separate example on the Julia benchmark’s page. In the second case I will apologize for being wrong, and post a blog post showing why such rules are fundamentally inconsistent and incoherent, which would be a valuable lesson.

@stevengj I did learn about this at school, but that was over 10 years now and I haven’t used it since in practice. I apologize for being sloppy about it. If you have an actual argument where my rules or reasoning is wrong, please let me know. When I said “easy to analyze”, I didn’t mean automatic analysis, but a paper + pencil analysis as I tried to show on a particular example.


How can you possible say that statements like

the compiler can rearrange the order of operations arbitrarily, as long as the changes are done locally (on the order of 10 elements), as that does not change the accuracy significantly. It cannot rearrange instructions globally (more than ~1000 operations apart), as that could potentially change the conditioning significantly.

are “consistent and coherent”.

These “guidelines” are so vague and open to interpretation that you can just bend them to fit the result of the examples you try them on. How should a compiler writer implement rules that are written this way? Does the compilation step involving calling you and asking you to judge whether an optimization is valid?