Random variations between results of CPU and GPU computation

ReWoP · May 8, 2023, 11:49pm

Hi all,

I’ve noticed that I’m getting some variations between the results computed in the GPU VS the ones obtained in the CPU. Here is a simple example:

a = CUDA.rand(Float64, 10000); 
@show CUDA.sum(a) - sum(Array(a));

####

b = rand(Float64, 10000); 
@show CUDA.sum(CuArray(b)) - sum(b);

If I run the code several times I end up getting 3 different responses, 0.0 (the expected result) or ±9.094947017729282e-13.

Does anyone know the reason for this behavior?

gdalle · May 9, 2023, 6:16am

Not an expert on GPU but I think it might be because floating point addition is not associative in general. So maybe the CPU and GPU arrays are summed in a different order, that’s all it would take to generate slightly different answers

navidcy · May 9, 2023, 8:07am

Are you sure that CuArray(b) is still Float64?

maleadt · May 9, 2023, 8:26am

Only cu will change datatypes.

navidcy · May 9, 2023, 8:48am

gotcha

GunnarFarneback · May 9, 2023, 9:10am

This is an easy experiment to convince yourself that the summation order may effect the result, without involving any GPU:

julia> for i = 1:100
           x = rand(10000)
           println(sum(x) - sum(reverse(x)))
       end

As it happens the sum function for floating point numbers is much more intricate than you would naively think and it’s not at all surprising that implementations on different hardware end up with (slightly) different results.

ReWoP · May 9, 2023, 6:39pm

I see…

So this makes things trickier when comparing computation using floating point numbers between CPU and GPU versions of the same code (specially when using CUDA kernel implementations)…

Guess that the only way to prevent this is to avoid Julia’s built in functions and implement everything yourself (thus having more control over the code execution order)?

It is also interesting that in the example provided by @GunnarFarneback that it shows the same variations as the example “a” and “b” above, including a new result of ±1.8189894035458565e-12 on the CPU/GPU code an ±2.7284841053187847e-12 only on the @GunnarFarneback example (CPU only).

So I’ve decided to count the variations of my examples and the one provide by @GunnarFarneback, and here are the results of an average of 3 runs of 10000 testes:

a = CUDA.rand(Float64, 10000); 
@show CUDA.sum(a) - sum(Array(a));

# 0.0                     -> 6295
# ±9.094947017729282e-13  -> 3515.7 (1777.3 / 1738.3 )
# ±1.8189894035458565e-12 -> 189.3 (93.3 / 96)
# ±2.7284841053187847e-12 -> 0
# Other                   -> 0

###

b = rand(Float64, 10000); 
@show CUDA.sum(CuArray(b)) - sum(b);

# 0.0                     -> 6335.3
# ±9.094947017729282e-13  -> 3455.7 (1703.3 / 1752.3)
# ±1.8189894035458565e-12 -> 209 (116 / 93)
# ±2.7284841053187847e-12 -> 0
# Other                   -> 0

###

x = rand(10000)
println(sum(x) - sum(reverse(x)))

# 0.0                     -> 6080.3
# ±9.094947017729282e-13  -> 3430 (1689 / 1741)
# ±1.8189894035458565e-12 -> 489 (248.3 / 240.6)
# ±2.7284841053187847e-12 -> 0,7
# Other                   -> 0

If the variations were due differences in the summed order, shouldn’t the values of the variations be more arbitrary, instead of what appears to be a pattern?

GunnarFarneback · May 9, 2023, 7:02pm

It’s not a coincidence that you see the same deviations but a consequence of how floating point numbers work or more exactly which numbers can be represented. Assume we have a sum that with some summation order yields exactly 5000. If we are slightly off we will hit these numbers:

julia> nextfloat(5000.0) - 5000.0
9.094947017729282e-13

julia> nextfloat(nextfloat(5000.0)) - 5000.0
1.8189894035458565e-12

julia> nextfloat(nextfloat(nextfloat(5000.0))) - 5000.0
2.7284841053187847e-12

If you want to see more exactly what those numbers are you can convert them to BigFloat:

julia> BigFloat(nextfloat(5000.0))
5000.0000000000009094947017729282379150390625

Well, yes, but you still need to make tradeoffs between what is fast on different hardware and what is numerically good enough (depending on your needs one or more of those might be non-issues).

A better plan than to require exact equality is usually to accept small numeric deviations but design your algorithms so the deviations don’t grow out of control.

Topic		Replies	Views
Different results when running on CPU or GPU General Usage cuda	6	2199	June 28, 2020
Possible reordering of floating point arithmetic in CUDA.jl GPU question	3	283	June 22, 2023
Different calculation results when using CPU vs. GPU with CUDA.jl General Usage question , gpu , cuda , atomic	2	1189	March 2, 2022
Create a simple CUDA.sum kernel GPU	3	1963	January 3, 2021
Different Float64 sum on different architectures General Usage	9	1132	September 20, 2020

Random variations between results of CPU and GPU computation

Related topics