Why assuming floating-point rounding errors are random is bad

This topic continues a discussion started in another topic here on discourse.

The goal here is to assemble a list of potential issues which may arise if you assume floating-point rounding errors are random rather than deterministic. Additions and corrections are welcome!

  • It may make you overlook bugs in your code, e.g. race conditions or uninitialised memory which may have a similar effect as rounding errors.
  • It may be confusing when you are investigating rounding errors in a given formula. For example, x-x will always evaluate to exactly 0, and this may be surprising if you assume floating-point arithmetic = exact arithmetic + some random noise.
  • It makes it impossible to do incremental development. It is sometimes possible to rearrange some computations such that they produce exactly the same answer but could potentially run faster. In such cases, it is convenient to 1) do the changes, 2) check that you still get exactly the same answer to rule out any typos and 3) benchmark. You deprive yourself of this development style if you assume that rounding errors are random.
2 Likes

If you are trying to do low level stuff, knowing exactly when operations are exact can be really powerful. In general for things like trig or other transcendental functions, assuming random error is pretty correct since you are truncating a transcendental number. However, for arithmatic (+,-,*, fma) the result has an exact value with a finite number of bits, and there are specific cases where you can know that the answer will be exact (or exact with twice as many bits which is useful for double float implementations).

4 Likes

Thanks for opening this topic, it was a good idea. The issues you raise are very important and go beyond Julia language. Since we are gathered here as Julia programmers, you might add a bullet or two on how these considerations reflect Julia packages, at least those from the General registry. Could the community reach the agreement if the packages from the General registry should be deterministic or not, and if not, or if no consensus can be reached, should the package developers at least classify in which category their package is - deterministic or heuristic. As we have it now, each Julia developer should learn on his or her own if each of the packages he or she is using are deterministic which is rather inefficient.

2 Likes

The problem is when doing high level stuff, and specifically for applications such as accounting, or finance in general, where the norm is to have rounded numbers. Improvements to Fixed Point numbers would be for these tasks, and fitting with the style of Julia programming, vs a lower level language like Rust or C++.

1 Like

Thanks for doing this thread!

That’s a good point but is still pretty unlikely (although certainly possible) that a race of invalid memory will cause an error of eps()

The important point is that errors are relative. Even if they are random, x-x is still going to be zero.

Isn’t this style of development super brittle? The class of valid optimizations you can do includes things that give exactly the same answer, as well as things that change the answer slightly. Having to differentiate between the two is hard, hardware/parallelization dependent and takes valuable mental space that is better spent actually thinking about the algorithm. Better assume every change to the code changes the answer slightly.

Thinking of errors as random has a few advantages. It forces you to actually think about error propagation when designing tests (and not run the code once and hardcode in the test that the result has to be exact, which is going to break in two weeks when you add parallelization, make an optimization or run the test on another CI machine). It lessens your expectations of guarantees libraries should provide (which allows them to do dynamic multithreading behind your back if they want to). It helps explain (via a random walk model) why error usually doesn’t propagate as fast as it could. Of course it’s wrong, but it’s wrong in the right way that serves as a useful abstraction for what’s actually going on and that doesn’t mislead you too much. It’s useful as a first order model for teaching (the most important message to convey to students first learning about it is that errors are relative, without necessarily having to enter into the details of IEEE guarantees and non associativity).

I do agree it’s not a substitute for real understanding, but I don’t think it’s such a bad model.

1 Like

Said differently : would life be very different if, instead of rounding to nearest, arithmetic would round randomly up or down?

Machine Hardware (CPU and GPU)

The IEEE-754 standard [this is the 2008 version; see changes for the 2019 version; also short comment here] defines five rounding rules–2 are “round to nearest rules”, and 3 are “directed rounding” rules:

Roundings to Nearest

  1. Round to nearest, ties to even – rounds to the nearest value; if the number falls midway, it is rounded to the nearest value with an even least significant digit; this is the default for binary floating point and the recommended default for decimal.

  2. Round to nearest, ties away from zero – rounds to the nearest value; if the number falls midway, it is rounded to the nearest value above (for positive numbers) or below (for negative numbers); this is intended as an option for decimal floating point.

Directed Roundings

  1. Round toward 0 (zero) – directed rounding towards zero (also known as truncation ).

  2. Round toward +∞ – directed rounding towards positive infinity (also known as rounding up or ceiling ).

  3. Round toward −∞ – directed rounding towards negative infinity (also known as rounding down or floor ).

Number 1 (round to nearest, ties to even) is the default rounding mode, and has the advantage that it is without positive/negative bias and without bias toward/away from zero.

  • However, it will distort a distribution of values, by increasing the probability of even numbers relative to the number of odd numbers.

Others

There are some obscure variations of #1 (“alternate tie-breaking” – take turns rounding ties to even/odd, and “random tie-breaking”–randomly round towards even/odd).

A method with desirable statistical properties is stochastic rounding (also see here), which is unbiased and the expected rounding error is zero. No hardware support (yet).

The Hidden Elephant In Hardware Floating Point

Rounding might not be your worst nightmare – it might be that floating point numbers are not equally spaced.

The Invisible Pink Unicorn In Numerical Accuracy and Precision

How does the computer calculate exp(), log(), sin(), cos(), atan(), Normal()? Often they use Chebyshev Polynomials to approximate these functions. These approximations are often excellent, as Chebyshev polynomials are close to the best polynomial approximation and minimize Runge’s phenomenon. But … the approximation errors with oscillate around zero.

Yet … Rounding Errors are Not Random

If we look at this page from Higham’s Accuracy and Stability of Numerical Algorithms (Second Edition), p. 25:

Higham’s book has additional references on the topic.

6 Likes

Thanks for the kind words, much appreciated! :blush:


I think the community consensus is pretty clear: if a function requires random number generation, then these random numbers should be generated using an RNG which is passed as an optional argument which defaults to Random.GLOBAL_RNG. That’s what @stevengj’s issue in IterativeSolvers.jl is aiming for, and it’s also the standard API in most programming languages (e.g. Julia, C++ and Numpy).


True, but to use the same argument that you made on a different point: assuming floating-point arithmetic to be deterministic frees you from ever having to worry about this possibility at all, hence it frees up some valuable mental space.

Okay, let’s do the same argument but with x+x == 2*x. :wink:

I vaguely remember some friends of mine doing semester projects with the Swiss weather forecasting agency, and I think their task was to parallelise some given codes under the constraint that the results had to be exactly the same. This is fairly hearsay, though, and I would be curious to hear from people with some actual experience on this.

You don’t need randomness to achieve the benefits you mentioned; it’s enough to accept that many numerical codes are considered “correct” if their results are within some tolerance of the exact results. Whether or not these results are reproducible is orthogonal to this issue of accuracy.

1 Like

Oh well, I didn’t have the pleasure to be your friend, but can fully sympathize with them because I had exactly the same experience while I was working at AVL in Graz. I was in the AST (Advanced Simulation Technologies) department and my task was to parallelize their CFD (Computational Fluid Dynamics) solver with MPI, SPMD approach (domain decomposition). The residuals from the parallel version were invariantly different from the sequential version simply because numeration of cells, order of operations and convergence history was different in MPI version from the sequential one. It took me a couple of years (I kid you not) to convince my colleagues at AST that it was normal behavior, that I can not ensure the same residuals from parallel version as they get from sequential. Still, each parallel run, provided it was performed with the same number of processors and with the same decomposition was giving the same residuals each time.

9 Likes

Hello, thank everyone for this insightful discussion.

From the abrupt end of the thread, I infer that this is going to stay, in Julia.

However, I want to make the argument that random approximation of rounding errors causes very weird computations. I hereby present an example:

julia> for x in 1:10
       println(x, " -> ",x*0.1*10 - x) 
       end
1 -> 0.0
2 -> 0.0
3 -> 4.440892098500626e-16
4 -> 0.0
5 -> 0.0
6 -> 8.881784197001252e-16
7 -> 8.881784197001252e-16
8 -> 0.0
9 -> 0.0
10 -> 0.0

julia> 76*0.1 == 7.6
false

julia> 76/10 == 7.6
true

I am not an expert in numerical approximation, but I don’t understand how these results can make sense.

Can someone explain to me the deep meaning of it?
Thanks

2 Likes

I think it is important to realise that there is no “random” or incorrect rounding going on here. In each case, each intermediate result is correctly rounded to the nearest Float64. For example: 76/10 is correctly rounded to the Float64 number 7.5999999999999996447… . The next possible Float64 number would be 7.6000000000000005329…, which is further away from the true answer. (The dots do not represent infinite series of decimals here, just fairly long ones. The conversion of small integers, like 10 or 76 into Float64 is exact.)

The constant 7.6 becomes the exact same number, for the exact same reason. It is the closest Float64 that exists to 7.6.

In the case of 76*0.1 there are two events of rounding. First, 0.1 is rounded to the nearest Float64, and then that number is multiplied by 76 and again rounded to the nearest Float64. The rounding happens to go twice in the same direction, which is why this result is different from the above.

6 Likes

Are you referring to that all deviations in the loop are positive and don’t seem to have a random sign? This is because 0.1 is not exactly represented by the IEEE754 double floating point, but a slightly larger value is used. 10 is exactly represented. Therefore the factor 0.1*10 increases the x, or it is rounded to the exact value. So the deviations never become negative.

With a random n like

for n in rand(Int, 5)
     for x in 1:10
           println(n,x, " -> ",x*(1/n)*n - x) 
     end
end

deviations with both negative and positive signs are observed.

4 Likes

Yes, better (in some cases), while maybe not compared to using Float64 or Float32(?).

No hardware support in ASIC you mean, but they did in FPGA which people might be able to use:

exploiting the noise-tolerance of deep neural networks […] For low-precision fixed-point computations, where conventional rounding schemes fail, adopting stochastic rounding during deep neural network training delivers results nearly identical as 32-bit floating-point computations.

I found this very interesting, I’m not sure if it has to do with “stochastic gradient descent”:

  1. Hardware Prototyping
    The execution time of the mini-batch stochastic gradient descent algorithm is dominated by a series of GEMM operations in the feed-forward

At least if you use stochastic [gradient descent] methods anyway, and do not expect deterministic answers anyway, then stochastic rounding on top might not be too bad.

My question, even if you would get worse (and random) results in some other cases, could having this stochastic rounding option help for debugging? Would you expect more spread in your results if your algorithm is bad, e.g. has catastrophic cancellation? It might help to reimplement floating-point in software (posits and decimal floats already is available in software, might do, instead of binary floats), just to add this option.

related packages:


I believe so, yes, which is why we developed the Verrou tool to help analyze (compiled) programs using Stochastic Arithmetic. Here are some references of what stochastic arithmetic can help you with, when analyzing large code bases for FP-related issues:

(I don’t want to imply that Verrou is the only initiative in this spirit; I’m just being lazy here. The papers listed there link to a lot of other references on the same topic)

2 Likes