I put together a Julia wrapper for his C code at:
I was too lazy to re-implement it from scratch, but I think this should be sufficient for most needs: it provides an xsum
function that does exact double-precision (Float64
and ComplexF64
) accumulation over arbitrary iterators (with an optimized method for contiguous arrays).
It seems to be quite fast — for an array of 10⁴ elements, it is actually about the same speed as a naive sum without SIMD. With @simd
(which is enabled for the built-in sum
) it is about 10x slower, since Neal’s code does not seem to exploit SIMD.
julia> function naivesum(a)
s = zero(eltype(a))
for x in a
s += x
end
return s
end
mysum (generic function with 1 method)
julia> a = randn(10^4);
julia> using Xsum, BenchmarkTools
julia> @btime sum($a);
1.107 μs (0 allocations: 0 bytes)
julia> @btime xsum($a);
12.272 μs (0 allocations: 0 bytes)
julia> @btime naivesum($a);
12.175 μs (0 allocations: 0 bytes)
julia> err(s, exact) = abs(s - exact) / abs(exact) # relative error
err (generic function with 1 method)
julia> exact = Float64(sum(big.(a))); err(sum(a), exact), err(xsum(a), exact), err(naivesum(a), exact)
(1.5635496798094023e-16, 0.0, 6.097843751256669e-15)
On my machine, Neal’s “large” accumulator starts being faster than the “small” accumulator around n=256.