Huge computation error in Float32?

tomtom · November 26, 2023, 2:47pm

the following shows that s1 was hugely mis-computed. It happens when doing calculations in Float32. Everything is fine in Float64.

Random.seed!(1);
n = 1000000;
a = rand(Float32, n);
b = rand(Float32, 100, n);
s1 = 0f0;
s2 = 0f0;
si = zeros(Float32, n);
for i in 1:n
    for j in 1:100
        s1 += a[i] * b[j, i]
        si[i] += a[i] * b[j, i]
    end
    s2 += si[i]
end

julia> s1
1.6777216f7
julia> s2
2.4987638f7


aF64 = Float64.(a);
bF64 = Float64.(b);
s1F64 = 0.0;
s2F64 = 0.0;
siF64 = zeros(n);
for i in 1:n
    for j in 1:100
        s1F64 += aF64[i] * bF64[j, i]
        siF64[i] += aF64[i] * bF64[j, i]
    end
    s2F64 += siF64[i]
end

julia> s1F64
2.4989047580531508e7
julia> s2F64
2.4989047580524024e7

Is it a big problem??? using v1.9.1. Thanks for your attention.

Mason · November 26, 2023, 2:54pm

It’s definitely a problem if you were expecting an accurate answer. But it’s also not unexpected. Everything you’re seeing here is documented, expected rounding behaviour of floating point arithmetic.

tomtom · November 26, 2023, 3:01pm

It’s a big problem if u see the difference between s1 and s2.

Given all the rounding errors in floating point calculations, s1 and s2 should be identical, so at least very close. Now they aren’t.

The results in Float64 are much more reasonable. So why this huge discrepancy happens only in Float32???

Or, in other words, is there any explanation that the computation of s2 is so much more accurate than s1???

Mason · November 26, 2023, 3:05pm

You’re accumulating error at every step of the calculation, and your calculation has on the other of 1000000 * 100 steps. That’s a lot of accumulated error, and Float32 has very low precision.

tomtom · November 26, 2023, 3:06pm

The problem is s2 is just getting the same times of error accumulation! Why the difference?

Mason · November 26, 2023, 3:16pm

it’s because s2 is accumulated with a more numerically stable pattern. s1 just grows and grows, and the larger it becomes, the larger the rounding errors will be.

Once s1 is of order 1f7, the epsilon between floats is of order 1f0:

julia> eps(1f7)
1.0f0

I strongly recommend reading about floating point error. It’s a subtle and complicated topic.

BLI · November 26, 2023, 3:16pm

I assume Mason is right: this is an example of accumulated rounding error. If Mason’s suggestion is correct, this has nothing to do with Julia and whether it is v1.9.1 or not: you would then get similar error in MATLAB or Python or whatever, as long as you use Float32 and use comparable algorithms.

tomtom · November 26, 2023, 3:18pm

thanks

Mason · November 26, 2023, 3:20pm

Here’s a perhaps simpler example that might help you understand:

julia> let v = rand(Float32, 1000000 * 100)
           s1 = 0f0
           s2 = 0f0
           for vi ∈ v
               s1 += vi
           end
           chunks = Iterators.partition(v, 100)
           for chunk ∈ chunks
               s3 = 0f0
               for vi ∈ chunk
                   s3 += vi
               end
               s2 += s3
           end
           @info "" s1 s2 sum(v)
       end
┌ Info: 
│   s1 = 1.6777216f7
│   s2 = 5.0003064f7
└   sum(v) = 5.0002704f7

Splitting the sum into chunks that we sum over individually is a good way to reduce errors.

Mason · November 26, 2023, 3:32pm

Maybe something I should spell out even more clearly @tomtom, is that once your accumulator hits 1f7, adding a number less than 1 won’t do anything to it:

julia> v1 = 1f7
1.0f7

julia> (v1 + 0.5f0) === 1f7
true

This is a catastrophic loss of precision. The benefit of summing up the numbers in chunks is that you’re more likely to add up numbers that are similar in magnitude, and thus avoid situations where you round away the entire answer.

bertschi · November 26, 2023, 3:42pm

Just for reference, adding the obligatory paper here: What every computer scientist should know about floating-point arithmetic
Mathematically, floating-point arithmetic is broken in being neither associative nor commutative.

and · November 26, 2023, 5:53pm

Here the topic is explained in a very accessible and gentle manner using Julia: Fundamentals of Numerical Computation. Floating-point numbers

arch.d.robison · November 27, 2023, 2:37pm

Clarification: Except when both operands are NaN, IEEE floating-point addition and multiplication are commutative. The commutativity is required because the IEEE standard requires that the result of addition or multiplication be the representable value closest to the exact result. In other words, the result might not be exact, but the order of the two operands does not matter.

Topic		Replies	Views
Sum of float64 vector gives slightly incorrect answer Performance question	44	5075	April 29, 2019
Question about floating-point precision in summation Performance question , package	22	2607	April 2, 2021
Computation precision with Float General Usage precision	8	1898	December 3, 2022
Incorrect summation of Float64 General Usage	8	807	January 23, 2023
Precision issue: (A.+R.-A.-R) do not vanish ?! Performance question	33	1727	July 29, 2018

Huge computation error in Float32?

Related topics