the following shows that s1 was hugely mis-computed. It happens when doing calculations in Float32. Everything is fine in Float64.

Random.seed!(1);
n = 1000000;
a = rand(Float32, n);
b = rand(Float32, 100, n);
s1 = 0f0;
s2 = 0f0;
si = zeros(Float32, n);
for i in 1:n
for j in 1:100
s1 += a[i] * b[j, i]
si[i] += a[i] * b[j, i]
end
s2 += si[i]
end
julia> s1
1.6777216f7
julia> s2
2.4987638f7
aF64 = Float64.(a);
bF64 = Float64.(b);
s1F64 = 0.0;
s2F64 = 0.0;
siF64 = zeros(n);
for i in 1:n
for j in 1:100
s1F64 += aF64[i] * bF64[j, i]
siF64[i] += aF64[i] * bF64[j, i]
end
s2F64 += siF64[i]
end
julia> s1F64
2.4989047580531508e7
julia> s2F64
2.4989047580524024e7

Is it a big problem??? using v1.9.1. Thanks for your attention.

It’s definitely a problem if you were expecting an accurate answer. But it’s also not unexpected. Everything you’re seeing here is documented, expected rounding behaviour of floating point arithmetic.

You’re accumulating error at every step of the calculation, and your calculation has on the other of 1000000 * 100 steps. That’s a lot of accumulated error, and Float32 has very low precision.

it’s because s2 is accumulated with a more numerically stable pattern. s1 just grows and grows, and the larger it becomes, the larger the rounding errors will be.

Once s1 is of order 1f7, the epsilon between floats is of order 1f0:

julia> eps(1f7)
1.0f0

I strongly recommend reading about floating point error. It’s a subtle and complicated topic.

I assume Mason is right: this is an example of accumulated rounding error. If Mason’s suggestion is correct, this has nothing to do with Julia and whether it is v1.9.1 or not: you would then get similar error in MATLAB or Python or whatever, as long as you use Float32 and use comparable algorithms.

Here’s a perhaps simpler example that might help you understand:

julia> let v = rand(Float32, 1000000 * 100)
s1 = 0f0
s2 = 0f0
for vi ∈ v
s1 += vi
end
chunks = Iterators.partition(v, 100)
for chunk ∈ chunks
s3 = 0f0
for vi ∈ chunk
s3 += vi
end
s2 += s3
end
@info "" s1 s2 sum(v)
end
┌ Info:
│ s1 = 1.6777216f7
│ s2 = 5.0003064f7
└ sum(v) = 5.0002704f7

Splitting the sum into chunks that we sum over individually is a good way to reduce errors.

Maybe something I should spell out even more clearly @tomtom, is that once your accumulator hits 1f7, adding a number less than 1 won’t do anything to it:

This is a catastrophic loss of precision. The benefit of summing up the numbers in chunks is that you’re more likely to add up numbers that are similar in magnitude, and thus avoid situations where you round away the entire answer.

Clarification: Except when both operands are NaN, IEEE floating-point addition and multiplication are commutative. The commutativity is required because the IEEE standard requires that the result of addition or multiplication be the representable value closest to the exact result. In other words, the result might not be exact, but the order of the two operands does not matter.