Speed issue with KahanSummation

@goerch PM’d me another solution of doing multi-threading Kahan summation based on KahanSummation.jl. It looks much more elegant and I don’t think @goerch would mind me sharing it here.

(Note that AccurateArithmetic.jl and KahanSummation.jl both export sum_kbn(). This post is about the function from KahanSummation.)

It is based on a patch of KahanSummation here. The patch has an outdated function mapreduce_single which has to be replaced by mapreduce_first for recent versions of Julia. After the update, the content could be loaded using include(".."). I made the package readily available on this page; search for the file name KahanSummation_patch.jl.

After loading, @goerch showed that it could be extended to construct a multi-threading (or, parallel) function as follows.

using InitialValues, Folds

include("KahanSummation_patch.jl")
psum_kbn(f, X) = singleprec(Folds.mapreduce(f, InitialValues.asmonoid(plus_kbn), X))
psum_kbn(X) = psum_kbn(identity, X)

The use of psum_kbn() is just the same as sum_kbn() and it is much faster.

So, there are currently a couple of high precision summation functions in Julia. In terms of speed, the rank is (slow to fast):

KahanSummation.sum_kbn(): < psum_kbn() < AccurateArithmetic.sum_kbn() = AccurateArithmetic.sum_oro()

I believe they are very useful for many applications. However, only KahanSummation.sum_kbn(), the slowest variant, is compatible with the AD package ForwardDiff.jl. I am starting another thread to discuss the issue.