Different Float64 sum on different architectures

greg_plowman · September 20, 2020, 9:16am

I sometimes get different results for summing a vector of Float64 numbers on different Windows machines. The vectors are identical, so it seems sum can produce different results across hardware architectures (even if os is always Windows).

Is this expected?

Is there a way to guarantee the same results?

Would sum_kbn from KahanSummation, or xsum from Xsum guarantee the same result?
Both of these alternative sums produce identical results across machines on my particular data.

https://github.com/JuliaMath/KahanSummation.jl
https://github.com/JuliaMath/Xsum.jl

Elrod · September 20, 2020, 9:37am

Why do you need the exact same result?

Yes, the results will be hardware dependent. Based on the width of the CPU’s vector registers, a different order of adding the numbers will be optimal. These different orders can slightly change rounding.

You can use foldl(+, x) instead of sum(x) to guarantee a particular order.
Most Kahan Summations implementations aren’t SIMD and thus will be exactly the same across architectures (AccurateArithmetic is), and of course Xsum promises to be exactly rounded, so you obviously shouldn’t see any rounding differences there.

danielw2904 · September 20, 2020, 10:44am

See here for a simple example:

Basically, summation is not associative and there is no guarantee for the order of execution of the sum.

Elrod · September 20, 2020, 10:55am

Here is a fun example of the lack of associativity (thanks to Stefan Karpinski):

function sumsto(x::Float64)
    0 <= x < exp2(970) || throw(ArgumentError("sum must be in [0,2^970)"))
    n, p₀ = Base.decompose(x) # integers such that `n*exp2(p₀) == x`
    [floatmax(); [exp2(p) for p in -1074:969 if iseven(n >> (p-p₀))]
    -floatmax(); [exp2(p) for p in -1074:969 if isodd(n >> (p-p₀))]]
end

Example:

julia> x = sumsto(2.3);

julia> y = sumsto(1e18);

julia> sort(x) == sort(y)
true

julia> foldl(+,x)
2.3

julia> foldl(+,y)
1.0e18

greg_plowman · September 20, 2020, 11:48am

Thanks for the responses.
Very interesting and enlightening!

It seems the same sum result can be guaranteed with:
foldl(+, x) # accumulates rounding error
xsum(x) # no rounding error

I’ll just have to add a method to allow dims argument for arrays.
I’m surprised a dims kwarg isn’t defined for foldl (or reduce)

greg_plowman · September 20, 2020, 12:24pm

Is there a way to disable SIMD and/ or whatever else is reordering the additions, so then standard sum executes deterministically independent of processor?
A @nosimd macro perhaps?

danielw2904 · September 20, 2020, 1:10pm

You can get generators over rows and columns with eachrow and eachcol respectively.

StefanKarpinski · September 20, 2020, 1:58pm

While that sounds appealing, it actually makes the summation both slower and less accurate. If you really want that just use foldl(+, a).

Tamas_Papp · September 20, 2020, 2:37pm

I think this is the key question for all these discussions (wrt float arithmetic, reproducibility of random sequences, etc).

If the results are close, this should not matter and things should be compared with some variant of isapprox.

If they are not, and appear to be very sensitive to random seeds or low-level questions of floating point arithmetic, they are not to be trusted anyway.

greg_plowman · September 20, 2020, 9:31pm

Yes, of course.

I think my use of standard sum did not convey my intention.
I did not mean for Base.sum to be re-implemented for everyone.

What I meant was could I use @nosimd sum(a) to get a deterministic sum. The advantage over foldl(+, a) is the extra methods available to sum. In particular, sum(a, dims=x).

This is no biggy though, I definitely can work with foldl(+, a).

Thanks again for all the responses.

Topic		Replies	Views
Operation yields different results on other machine General Usage	3	395	April 17, 2021
Array ordering and naive summation Numerics examples	5	5818	May 13, 2023
Random variations between results of CPU and GPU computation GPU	7	419	May 9, 2023
Different inner product if vector type is not set General Usage question	8	795	March 1, 2020
Question about floating-point precision in summation Performance question , package	22	2599	April 2, 2021

Different Float64 sum on different architectures

Related topics