Kahan summation in `sum`?

CameronBieganek · August 11, 2023, 4:18pm

I was under the impression that sum(itr) uses Kahan summation, but after inspecting the Julia repo it appears that it calls mapreduce(identity, add_sum, itr). And add_sum is basically just + with some promotions for small integer types. Does sum(itr) not use Kahan summation, or is there a code path that I’m missing?

Looking at the docs for numpy.sum, it appears that they use pairwise summation (not Kahan summation) to improve the output precision.

nsajko · August 11, 2023, 4:21pm

Long ago some code for Kahan summation used to live in Base, but now it’s in a separate Git repo on Github:

So Base doesn’t use compensated summation.

LaurentPlagne · August 11, 2023, 4:22pm

You may be interested by

mbauman · August 11, 2023, 4:26pm

Here’s the history, which includes a good high-level comparison between the naive, pairwise and Kahan summation strategies: RFC: use pairwise summation for sum, cumsum, and cumprod by stevengj · Pull Request #4039 · JuliaLang/julia · GitHub

Pairwise summation recursively divides the array in half, sums the halves recursively, and then adds the two sums. As long as the base case is large enough (here, n=128 seemed to suffice), the overhead of the recursion is negligible compared to naive summation (a simple loop). The advantage of this algorithm is that it achieves O(sqrt(log n)) mean error growth, versus O(sqrt(n)) for naive summation, which is almost indistinguishable from the O(1) error growth of Kahan compensated summation.

nsajko · August 11, 2023, 4:30pm

Compensated algorithms have a catch that’s not mentioned in that repo’s README, though: they usually assume that the exponent range of the relevant FP format is large enough, which may not be the case. This means that the compensated algorithms may produce NaN when a naive algorithm would, more correctly, produce an infinity. For example, Ogita, Rump & Oishi say (in the compensated dot product paper):

We assume that no overflow occurs, but allow underflow.

So I’m pretty sure this catch also applies to simple compensated summation.

Oscar_Smith · August 11, 2023, 4:33pm

this doesn’t apply to compensated sumation (unless you disable subnormals). Addition of floating point numbers never underflows.

cstjean · August 11, 2023, 4:34pm

github.com/JuliaLang/julia

Imprecision of sum(::Generator)

opened 07:30PM - 17 Dec 18 UTC

cstjean

maths

It seems that `sum` uses the naive sequential sum algorithm for generators. With… large vectors, it eventually saturates, and yields an incorrect answer: ```julia julia> N = 100000000; aa = rand(Float32, N); julia> mean((x for x in aa)) 0.16777216f0 julia> mean(aa) 0.500059f0 ``` I have a real-world case where it causes an alarmingly large difference: ```julia julia> mean(skipmissing(Umat)) 1.0638367f0 V julia> mean(collect(skipmissing(Umat))) 3.1320891f0 V ``` As @simonbyrne pointed out [on discourse](https://discourse.julialang.org/t/imprecision-of-mean-over-iterators-of-large-vectors/18759), `sum(::Array)` already uses a smarter algorithm. It could presumably be used for generators, too.

Imprecision of sum(::Generator) · Issue #30421 · JuliaLang/julia · GitHub has a kahan summation implementation.

mbauman · August 11, 2023, 4:40pm

Yeah, you can only do the pairwise recursion if you can index into the object at arbitrary indices (which generators can’t do in general). But if you have an array, then pairwise summation is the obvious answer that balances performance and accuracy — it’s why others do the same thing.

CameronBieganek · August 11, 2023, 5:14pm

Just to clarify, it sounds like sum performs pairwise summation for objects that support linear indexing?

mbauman · August 11, 2023, 5:30pm

Yes, but not just linear indexing and not just sum — sufficiently large AbstractArrays (and lazy broadcasts) use a recursive divide and conquer strategy for most reductions. The size cutoff varies by operator.

This is why we have both reduce (whose order of traversal is unspecified) and foldl/foldr.

e3c6 · December 28, 2024, 2:39pm

Should this be mentioned in the docs for sum? That is, that the implementation can often give better accuracy than naively summing the elements one by one?

e3c6 · December 28, 2024, 2:47pm

Also a related question: is LinearAlgebra.dot using similar compensated algorithms for improved accuracy (like sum)?

giordano · December 28, 2024, 6:15pm

As I also said in your cross-post on Slack, LinearAlgebra.dot(::AbstractArray{T}, ::AbstractArray{T}) where {T<:Union{Float32,Float64}} calls the corresponding routines in the currently used BLAS library, so that’s a question for the BLAS library you’re using, there’s no promise from the Julia side to do that.

mbauman · December 29, 2024, 1:04am

This isn’t true! My mind couldn’t imagine the possibility, but a recursive approach with iteration alone is totally doable… and there’s a WIP for it, too!

github.com/JuliaLang/julia

use pairwise order for mapreduce on arbitrary iterators

JuliaLang:master ← JuliaLang:sgj/mapreduce_pairwise

opened 04:01AM - 05 Dec 23 UTC

stevengj

+84 -11

Currently, `mapreduce` uses a pairwise reduction order for arrays, but switches …to `mapfoldl` order for other iterators. This has the unfortunate effect that [pairwise summation](https://en.wikipedia.org/wiki/Pairwise_summation) is only used for arrays, and other iterators get much less accurate floating-point sums (and related reductions). For example, passing an array through a generator or an `Iterators.filter` would suddenly make sums less accurate. I had long been under the mistaken impression that pairwise reduction required random access to an iterator, but @mikmoore pointed out in #52365 that this is not the case. This WIP PR changes `mapreduce` to use a pairwise order by default for arbitrary iterators. I've only done light testing so far, but it should make summation about equally accurate for arrays and other iterators, and the performance seems about the same as the old `mapfoldl` fallback. More testing and benchmarking required, but I wanted to post some code to get the ball rolling. (Should not be a breaking change, in theory, since we explicitly document that the associativity of `mapreduce` is implementation-dependent.) Note that if you want to try out this code on an older version of Julia, just define the following `foldl`-like functions ```jl import Base: _InitialValue, mapreduce_empty_iter, reduce_empty_iter, _xfadjoint, pairwise_blocksize, mapreduce_first, Generator, MappingRF, mapfoldl_impl mapreduce_empty_iter(f::F, op::OP, itr) where {F,OP} = reduce_empty_iter(_xfadjoint(op, Generator(f, itr))...) mapfoldp(f, op, itr; init=_InitialValue()) = mapreduce_impl(f, op, init, itr) foldp(op, itr; init=_InitialValue()) = mapfoldp(identity, op, itr; init=init) ``` and copy the new `mapreduce_impl` and `_mapreduce_impl` methods from the PR (the chunk of the diff starting at `macro _repeat`). Closes #30421.

Topic		Replies	Views
Julia equivalent of Python's "fsum" for floating point summation General Usage python	54	7262	August 16, 2019
Accurate summation algorithm Numerics	34	10066	December 1, 2017
Speed issue with KahanSummation Performance question , package , parallel	12	1235	February 5, 2022
Imprecision of mean over iterators of large vectors General Usage	7	992	December 18, 2018
Why sum_kbn can't sum generators? Internals & Design proposal	5	1190	February 16, 2017

Kahan summation in `sum`?

Related topics