Huge computation error in Float32?

the following shows that s1 was hugely mis-computed. It happens when doing calculations in Float32. Everything is fine in Float64.

Random.seed!(1);
n = 1000000;
a = rand(Float32, n);
b = rand(Float32, 100, n);
s1 = 0f0;
s2 = 0f0;
si = zeros(Float32, n);
for i in 1:n
    for j in 1:100
        s1 += a[i] * b[j, i]
        si[i] += a[i] * b[j, i]
    end
    s2 += si[i]
end

julia> s1
1.6777216f7
julia> s2
2.4987638f7


aF64 = Float64.(a);
bF64 = Float64.(b);
s1F64 = 0.0;
s2F64 = 0.0;
siF64 = zeros(n);
for i in 1:n
    for j in 1:100
        s1F64 += aF64[i] * bF64[j, i]
        siF64[i] += aF64[i] * bF64[j, i]
    end
    s2F64 += siF64[i]
end

julia> s1F64
2.4989047580531508e7
julia> s2F64
2.4989047580524024e7

Is it a big problem??? using v1.9.1. Thanks for your attention.

It’s definitely a problem if you were expecting an accurate answer. But it’s also not unexpected. Everything you’re seeing here is documented, expected rounding behaviour of floating point arithmetic.

3 Likes

It’s a big problem if u see the difference between s1 and s2.

Given all the rounding errors in floating point calculations, s1 and s2 should be identical, so at least very close. Now they aren’t.

The results in Float64 are much more reasonable. So why this huge discrepancy happens only in Float32???

Or, in other words, is there any explanation that the computation of s2 is so much more accurate than s1???

You’re accumulating error at every step of the calculation, and your calculation has on the other of 1000000 * 100 steps. That’s a lot of accumulated error, and Float32 has very low precision.

3 Likes

The problem is s2 is just getting the same times of error accumulation! Why the difference?

it’s because s2 is accumulated with a more numerically stable pattern. s1 just grows and grows, and the larger it becomes, the larger the rounding errors will be.

Once s1 is of order 1f7, the epsilon between floats is of order 1f0:

julia> eps(1f7)
1.0f0

I strongly recommend reading about floating point error. It’s a subtle and complicated topic.

6 Likes

I assume Mason is right: this is an example of accumulated rounding error. If Mason’s suggestion is correct, this has nothing to do with Julia and whether it is v1.9.1 or not: you would then get similar error in MATLAB or Python or whatever, as long as you use Float32 and use comparable algorithms.

2 Likes

thanks

Here’s a perhaps simpler example that might help you understand:

julia> let v = rand(Float32, 1000000 * 100)
           s1 = 0f0
           s2 = 0f0
           for vi ∈ v
               s1 += vi
           end
           chunks = Iterators.partition(v, 100)
           for chunk ∈ chunks
               s3 = 0f0
               for vi ∈ chunk
                   s3 += vi
               end
               s2 += s3
           end
           @info "" s1 s2 sum(v)
       end
┌ Info: 
│   s1 = 1.6777216f7
│   s2 = 5.0003064f7
└   sum(v) = 5.0002704f7

Splitting the sum into chunks that we sum over individually is a good way to reduce errors.

3 Likes

Maybe something I should spell out even more clearly @tomtom, is that once your accumulator hits 1f7, adding a number less than 1 won’t do anything to it:

julia> v1 = 1f7
1.0f7

julia> (v1 + 0.5f0) === 1f7
true

This is a catastrophic loss of precision. The benefit of summing up the numbers in chunks is that you’re more likely to add up numbers that are similar in magnitude, and thus avoid situations where you round away the entire answer.

4 Likes

Just for reference, adding the obligatory paper here: What every computer scientist should know about floating-point arithmetic
Mathematically, floating-point arithmetic is broken in being neither associative nor commutative.

7 Likes

Here the topic is explained in a very accessible and gentle manner using Julia: Fundamentals of Numerical Computation. Floating-point numbers

6 Likes

Clarification: Except when both operands are NaN, IEEE floating-point addition and multiplication are commutative. The commutativity is required because the IEEE standard requires that the result of addition or multiplication be the representable value closest to the exact result. In other words, the result might not be exact, but the order of the two operands does not matter.

9 Likes