I was under the impression that sum(itr) uses Kahan summation, but after inspecting the Julia repo it appears that it callsmapreduce(identity, add_sum, itr). And add_sum is basically just + with some promotions for small integer types. Does sum(itr) not use Kahan summation, or is there a code path that I’m missing?
Looking at the docs for numpy.sum, it appears that they use pairwise summation (not Kahan summation) to improve the output precision.
Pairwise summation recursively divides the array in half, sums the halves recursively, and then adds the two sums. As long as the base case is large enough (here, n=128 seemed to suffice), the overhead of the recursion is negligible compared to naive summation (a simple loop). The advantage of this algorithm is that it achieves O(sqrt(log n)) mean error growth, versus O(sqrt(n)) for naive summation, which is almost indistinguishable from the O(1) error growth of Kahan compensated summation.
Compensated algorithms have a catch that’s not mentioned in that repo’s README, though: they usually assume that the exponent range of the relevant FP format is large enough, which may not be the case. This means that the compensated algorithms may produce NaN when a naive algorithm would, more correctly, produce an infinity. For example, Ogita, Rump & Oishi say (in the compensated dot product paper):
We assume that no overflow occurs, but allow underflow.
So I’m pretty sure this catch also applies to simple compensated summation.
Yeah, you can only do the pairwise recursion if you can index into the object at arbitrary indices (which generators can’t do in general). But if you have an array, then pairwise summation is the obvious answer that balances performance and accuracy — it’s why others do the same thing.
Yes, but not just linear indexing and not just sum — sufficiently large AbstractArrays (and lazy broadcasts) use a recursive divide and conquer strategy for most reductions. The size cutoff varies by operator.
This is why we have both reduce (whose order of traversal is unspecified) and foldl/foldr.
Should this be mentioned in the docs for sum? That is, that the implementation can often give better accuracy than naively summing the elements one by one?
This isn’t true! My mind couldn’t imagine the possibility, but a recursive approach with iteration alone is totally doable… and there’s a WIP for it, too!