# Random variations between results of CPU and GPU computation

Hi all,

I’ve noticed that I’m getting some variations between the results computed in the GPU VS the ones obtained in the CPU. Here is a simple example:

``````a = CUDA.rand(Float64, 10000);
@show CUDA.sum(a) - sum(Array(a));

####

b = rand(Float64, 10000);
@show CUDA.sum(CuArray(b)) - sum(b);
``````

If I run the code several times I end up getting 3 different responses, 0.0 (the expected result) or ±9.094947017729282e-13.

Does anyone know the reason for this behavior?

Not an expert on GPU but I think it might be because floating point addition is not associative in general. So maybe the CPU and GPU arrays are summed in a different order, that’s all it would take to generate slightly different answers

2 Likes

Are you sure that `CuArray(b)` is still Float64?

Only `cu` will change datatypes.

gotcha

This is an easy experiment to convince yourself that the summation order may effect the result, without involving any GPU:

``````julia> for i = 1:100
x = rand(10000)
println(sum(x) - sum(reverse(x)))
end
``````

As it happens the `sum` function for floating point numbers is much more intricate than you would naively think and it’s not at all surprising that implementations on different hardware end up with (slightly) different results.

2 Likes

I see…

So this makes things trickier when comparing computation using floating point numbers between CPU and GPU versions of the same code (specially when using CUDA kernel implementations)…

Guess that the only way to prevent this is to avoid Julia’s built in functions and implement everything yourself (thus having more control over the code execution order)?

It is also interesting that in the example provided by @GunnarFarneback that it shows the same variations as the example “a” and “b” above, including a new result of ±1.8189894035458565e-12 on the CPU/GPU code an ±2.7284841053187847e-12 only on the @GunnarFarneback example (CPU only).

So I’ve decided to count the variations of my examples and the one provide by @GunnarFarneback, and here are the results of an average of 3 runs of 10000 testes:

``````a = CUDA.rand(Float64, 10000);
@show CUDA.sum(a) - sum(Array(a));

# 0.0                     -> 6295
# ±9.094947017729282e-13  -> 3515.7 (1777.3 / 1738.3 )
# ±1.8189894035458565e-12 -> 189.3 (93.3 / 96)
# ±2.7284841053187847e-12 -> 0
# Other                   -> 0

###

b = rand(Float64, 10000);
@show CUDA.sum(CuArray(b)) - sum(b);

# 0.0                     -> 6335.3
# ±9.094947017729282e-13  -> 3455.7 (1703.3 / 1752.3)
# ±1.8189894035458565e-12 -> 209 (116 / 93)
# ±2.7284841053187847e-12 -> 0
# Other                   -> 0

###

x = rand(10000)
println(sum(x) - sum(reverse(x)))

# 0.0                     -> 6080.3
# ±9.094947017729282e-13  -> 3430 (1689 / 1741)
# ±1.8189894035458565e-12 -> 489 (248.3 / 240.6)
# ±2.7284841053187847e-12 -> 0,7
# Other                   -> 0
``````

If the variations were due differences in the summed order, shouldn’t the values of the variations be more arbitrary, instead of what appears to be a pattern?

It’s not a coincidence that you see the same deviations but a consequence of how floating point numbers work or more exactly which numbers can be represented. Assume we have a sum that with some summation order yields exactly 5000. If we are slightly off we will hit these numbers:

``````julia> nextfloat(5000.0) - 5000.0
9.094947017729282e-13

julia> nextfloat(nextfloat(5000.0)) - 5000.0
1.8189894035458565e-12

julia> nextfloat(nextfloat(nextfloat(5000.0))) - 5000.0
2.7284841053187847e-12
``````

If you want to see more exactly what those numbers are you can convert them to `BigFloat`:

``````julia> BigFloat(nextfloat(5000.0))
5000.0000000000009094947017729282379150390625
``````

Well, yes, but you still need to make tradeoffs between what is fast on different hardware and what is numerically good enough (depending on your needs one or more of those might be non-issues).

A better plan than to require exact equality is usually to accept small numeric deviations but design your algorithms so the deviations don’t grow out of control.

2 Likes