Precision issue: (A.+R.-A.-R) do not vanish ?!

N = 1023
A = rand(1000:20000,(N,N));
R = rand(ComplexF64,N,N);

On my machine it gives


What is happening?

Machine \epsilon due to order of operations?


sum(abs.(R .- R .+ A .- A))

I indeed get 0.0.

1 Like

Yes, R.-R and A.-A actually yield zero matrices.
Python3 with numpy library also have this issue.

I came across this issue by testing my code and I got desperate ! I don’t know how many such precision issues are there due to my undisciplined coding habit !!!

Floating-point arithmetic itself has this issue. Unless the two quantities happen to have exactly the same exponents, there’s going to be some precision loss when adding.


But are there generally some principles / common practices that minimizes this uncertainty?

See here for an introduction to the concept.

Also see the first section of Numerical Recipes.

Oh ! That is quite an education …

1 Like

It’s a whole field of computational study. In general, don’t expect floating-point arithmetic to be precisely associative. If you expect something to be zero, don’t be too surprised when it’s not quite zero (avoid comparisons to zero whenever possible). It’s better to check approximate equality of two non-zero quantities than to check that their difference is exactly zero. The non-zero quantities give a decent sense of scale and thus when an approximate equality is ok; by the time you’ve subtracted them, you’ve lost all sense of scale so it’s hard to tell if a non-zero residual is acceptable. At the end of a long computation, if half of your bits are correct, you’re doing pretty well. This is why having 64 bits of precision is so important—so that you have a decent chance that the first dozen of those bits are actually right.


But beware of copying any of the recipes in Numerical Recipes—they are often quite flawed (there may be legal and copyright issues as well). The general principles it explains are good to understand though.


Also, your Complex64 is made of Float32 types. That is the primary reason you get a result bigger than eps(Float64).

Edit: My apologies, didn’t see the 0.7.0 change. Then indeed this issue may be a roundoff issue, because A is an Int64 array populated with values between 1000 and 20000, while the modulus of any element in R is smaller than one.

Edit2: If you do println(sum(abs.(A.-A.+R.-R))), you’d always get 0.0, which confirms that floating point math is not associative.

Edit3: On the other hand, you were summing up 1023 * 1023 elements, so the total roundoff error of 4.5e-7 is not surprising.

I never copy recipes. I write my own code and test, and … fail :smiley:

1 Like

The ComplexF64 type is actually Complex{Float64}; we changed it from the old Complex64 which was a 64-bit complex type with Float32 parts because this kept tripping people up.

1 Like

I see. My apologies for not catching up with the 0.7.0 changes.

1 Like

I was referring to the explanations in the text, not the code. There are dubious copyright issues with the code itself. I knew one of the authors, so fortunately I got the catharsis of complaining about this in person. :smile:

He was using ComplexF64 (note the F) which was introduced in 0.7 as an alias for Complex{Float64}.

Indeed you should never do equality comparisons with floating point values. This principle was deeply ingrained in me from an early age, but lately, as I’ve come to understand a bit better how code works especially in Julia, I’ve started to sometimes when I know it should be ok (such as sum(R .- R) in the example above). I should probably immediately abandon that habit though, it’s bad practice. Never do equality checks with floats.

Stefan, may I ask about how to defend my work if some referee says "Your work is ultimately meaningless because the uncontrolled / unknown precision issues in JuliaLang could easily ruin the calculation? " Are there any published papers to cite if I use Julia for scientific computing? do I have to cite anything?

Julia is just using floating point arithmetic. Fortran, C, C++, Python, MATLAB, R, etc. all have exactly the same issue with exactly the same precision. This is built into your computer. Everyone knows about it. If a reviewer says you have precision issues, it would be due to using a numerically unstable algorithm and not anything to do with a programming language.


I do think the same. Gonna rethink all the calculations I’ve done tonight …

By the way, the problem actually came from my effort to test a column reduction function I wrote with CUDAnative.jl. The function sum the first axis of array2d_in and write results to array1d_out. The code is ugly because I explicitly use Uint32, in order to make the code portable. Many GPUs prefer 32bit integers.

using CUDAdrv
using CUDAnative
Curry = CuArray

#NOTE [:,b] ==> [b]
function ReduceC!(
    # -------------------------------------------------
	(dim_to_reduce::UInt32, dim1::UInt32) = dims
	tid::UInt32 = UInt32(threadIdx().x)
	bid::UInt32 = UInt32(blockIdx().x)
	sdata = @cuStaticSharedMem(ComplexF64,1024) #NOTE doubled shared memory
	sdata[tid] = ComplexF64(0)
	if bid<=dim1
		# ------------------------------ PART 1 ------------------------------
		# accumulate on the shared memory in each block
		tl::UInt32 = tid
		while tl <= dim_to_reduce
			@inbounds sdata[tid] += array2d_in[tl,bid];
			tl += 0x00000400
		# ------------------------------ PART 2 ------------------------------
		# folding the shared memory
		if tid<=0x00000200
			@inbounds sdata[tid] += sdata[tid + 0x00000200]
		if tid<=0x00000100
			@inbounds sdata[tid] += sdata[tid + 0x00000100]
		if tid<=0x00000080
			@inbounds sdata[tid] += sdata[tid + 0x00000080]
		if tid<=0x00000040
			@inbounds sdata[tid] += sdata[tid + 0x00000040]
		if tid<=0x00000020
			@inbounds sdata[tid] += sdata[tid + 0x00000020]; sync_warp()
			@inbounds sdata[tid] += sdata[tid + 0x00000010]; sync_warp()
			@inbounds sdata[tid] += sdata[tid + 0x00000008]; sync_warp()
			@inbounds sdata[tid] += sdata[tid + 0x00000004]; sync_warp()
			@inbounds sdata[tid] += sdata[tid + 0x00000002]; sync_warp()
			@inbounds sdata[tid] += sdata[tid + 0x00000001]; sync_warp()
		if tid==0x00000001
            @inbounds array1d_out[bid] = sdata[0x00000001]

To test the code, I wote

N = 1023
A = rand(1000:20000,(N,N))
R = rand(ComplexF64,N,N)
cuAF = Curry( ComplexF64.(A) .+ R )
cuBF = Curry( fill(ComplexF64(0),N) )

@cuda threads=1024 blocks=N ReduceC!(UInt32.((N,N)),cuBF,cuAF)

REDUCTION_CPU = sum(Array(cuAF),dims=1)[1,:]
println( sum(abs.(REDUCTION_CPU.-REDUCTION_GPU)) )

The result (different from 0.0) confused me and I suspected that GPU handles Float64 differently … I am still sorting out from all online materials.

I am not very familiar with GPU programming, and unable to source the numeric errors …

GPUs compute in parallel. That plus the lack of associativity means it can be far harder to control such floating point errors and they could change each run due to how things parallelize. In this case, the reduction ordering can be random due to the parallelism, and so exactly the difference from this test case causes numerical differences (i.e. the next value summed cannot be assumed to be the next value in the array, but changing order changes the floating point errors).

1 Like

Julia is generally more careful with numerical stability, gives the user more control over it, and offers many more ways to validate numerical accuracy of computations than ANY other programming language. If you want to investigate the numerical stability of your algorithm, there are many high quality options in Julia, including:

None of these are an option in Fortran, C, C++, Python, Matlab or R. The real question is: if you’re not using Julia, then how do you know that your algorithm is accurate and numerically stable?


Woah, this is just an uncertainty much like the others. In publications it should be quoted along with your other uncertainties. Like other uncertainties, it is bounded. I don’t know what journal this is, but any referee with any experience whatsoever in working on computational problems should understand that this uncertainty exists. In practice, usually uncertainty due to floating point arithmetic with Float64 is negligible relative to statistical and systematic uncertainties, even in most fields of physics.

1 Like