Why assuming floating-point rounding errors are random is bad

This topic continues a discussion started in another topic here on discourse.

The goal here is to assemble a list of potential issues which may arise if you assume floating-point rounding errors are random rather than deterministic. Additions and corrections are welcome!

  • It may make you overlook bugs in your code, e.g. race conditions or uninitialised memory which may have a similar effect as rounding errors.
  • It may be confusing when you are investigating rounding errors in a given formula. For example, x-x will always evaluate to exactly 0, and this may be surprising if you assume floating-point arithmetic = exact arithmetic + some random noise.
  • It makes it impossible to do incremental development. It is sometimes possible to rearrange some computations such that they produce exactly the same answer but could potentially run faster. In such cases, it is convenient to 1) do the changes, 2) check that you still get exactly the same answer to rule out any typos and 3) benchmark. You deprive yourself of this development style if you assume that rounding errors are random.
1 Like

If you are trying to do low level stuff, knowing exactly when operations are exact can be really powerful. In general for things like trig or other transcendental functions, assuming random error is pretty correct since you are truncating a transcendental number. However, for arithmatic (+,-,*, fma) the result has an exact value with a finite number of bits, and there are specific cases where you can know that the answer will be exact (or exact with twice as many bits which is useful for double float implementations).

3 Likes

Thanks for opening this topic, it was a good idea. The issues you raise are very important and go beyond Julia language. Since we are gathered here as Julia programmers, you might add a bullet or two on how these considerations reflect Julia packages, at least those from the General registry. Could the community reach the agreement if the packages from the General registry should be deterministic or not, and if not, or if no consensus can be reached, should the package developers at least classify in which category their package is - deterministic or heuristic. As we have it now, each Julia developer should learn on his or her own if each of the packages he or she is using are deterministic which is rather inefficient.

1 Like

The problem is when doing high level stuff, and specifically for applications such as accounting, or finance in general, where the norm is to have rounded numbers. Improvements to Fixed Point numbers would be for these tasks, and fitting with the style of Julia programming, vs a lower level language like Rust or C++.

1 Like

Thanks for doing this thread!

That’s a good point but is still pretty unlikely (although certainly possible) that a race of invalid memory will cause an error of eps()

The important point is that errors are relative. Even if they are random, x-x is still going to be zero.

Isn’t this style of development super brittle? The class of valid optimizations you can do includes things that give exactly the same answer, as well as things that change the answer slightly. Having to differentiate between the two is hard, hardware/parallelization dependent and takes valuable mental space that is better spent actually thinking about the algorithm. Better assume every change to the code changes the answer slightly.

Thinking of errors as random has a few advantages. It forces you to actually think about error propagation when designing tests (and not run the code once and hardcode in the test that the result has to be exact, which is going to break in two weeks when you add parallelization, make an optimization or run the test on another CI machine). It lessens your expectations of guarantees libraries should provide (which allows them to do dynamic multithreading behind your back if they want to). It helps explain (via a random walk model) why error usually doesn’t propagate as fast as it could. Of course it’s wrong, but it’s wrong in the right way that serves as a useful abstraction for what’s actually going on and that doesn’t mislead you too much. It’s useful as a first order model for teaching (the most important message to convey to students first learning about it is that errors are relative, without necessarily having to enter into the details of IEEE guarantees and non associativity).

I do agree it’s not a substitute for real understanding, but I don’t think it’s such a bad model.

1 Like

Said differently : would life be very different if, instead of rounding to nearest, arithmetic would round randomly up or down?

Machine Hardware (CPU and GPU)

The IEEE-754 standard [this is the 2008 version; see changes for the 2019 version; also short comment here] defines five rounding rules–2 are “round to nearest rules”, and 3 are “directed rounding” rules:

Roundings to Nearest

  1. Round to nearest, ties to even – rounds to the nearest value; if the number falls midway, it is rounded to the nearest value with an even least significant digit; this is the default for binary floating point and the recommended default for decimal.

  2. Round to nearest, ties away from zero – rounds to the nearest value; if the number falls midway, it is rounded to the nearest value above (for positive numbers) or below (for negative numbers); this is intended as an option for decimal floating point.

Directed Roundings

  1. Round toward 0 (zero) – directed rounding towards zero (also known as truncation ).

  2. Round toward +∞ – directed rounding towards positive infinity (also known as rounding up or ceiling ).

  3. Round toward −∞ – directed rounding towards negative infinity (also known as rounding down or floor ).

Number 1 (round to nearest, ties to even) is the default rounding mode, and has the advantage that it is without positive/negative bias and without bias toward/away from zero.

  • However, it will distort a distribution of values, by increasing the probability of even numbers relative to the number of odd numbers.

Others

There are some obscure variations of #1 (“alternate tie-breaking” – take turns rounding ties to even/odd, and “random tie-breaking”–randomly round towards even/odd).

A method with desirable statistical properties is stochastic rounding (also see here), which is unbiased and the expected rounding error is zero. No hardware support (yet).

The Hidden Elephant In Hardware Floating Point

Rounding might not be your worst nightmare – it might be that floating point numbers are not equally spaced.

The Invisible Pink Unicorn In Numerical Accuracy and Precision

How does the computer calculate exp(), log(), sin(), cos(), atan(), Normal()? Often they use Chebyshev Polynomials to approximate these functions. These approximations are often excellent, as Chebyshev polynomials are close to the best polynomial approximation and minimize Runge’s phenomenon. But … the approximation errors with oscillate around zero.

Yet … Rounding Errors are Not Random

If we look at this page from Higham’s Accuracy and Stability of Numerical Algorithms (Second Edition), p. 25:

Higham’s book has additional references on the topic.

4 Likes

Thanks for the kind words, much appreciated! :blush:


I think the community consensus is pretty clear: if a function requires random number generation, then these random numbers should be generated using an RNG which is passed as an optional argument which defaults to Random.GLOBAL_RNG. That’s what @stevengj’s issue in IterativeSolvers.jl is aiming for, and it’s also the standard API in most programming languages (e.g. Julia, C++ and Numpy).


True, but to use the same argument that you made on a different point: assuming floating-point arithmetic to be deterministic frees you from ever having to worry about this possibility at all, hence it frees up some valuable mental space.

Okay, let’s do the same argument but with x+x == 2*x. :wink:

I vaguely remember some friends of mine doing semester projects with the Swiss weather forecasting agency, and I think their task was to parallelise some given codes under the constraint that the results had to be exactly the same. This is fairly hearsay, though, and I would be curious to hear from people with some actual experience on this.

You don’t need randomness to achieve the benefits you mentioned; it’s enough to accept that many numerical codes are considered “correct” if their results are within some tolerance of the exact results. Whether or not these results are reproducible is orthogonal to this issue of accuracy.

1 Like

Oh well, I didn’t have the pleasure to be your friend, but can fully sympathize with them because I had exactly the same experience while I was working at AVL in Graz. I was in the AST (Advanced Simulation Technologies) department and my task was to parallelize their CFD (Computational Fluid Dynamics) solver with MPI, SPMD approach (domain decomposition). The residuals from the parallel version were invariantly different from the sequential version simply because numeration of cells, order of operations and convergence history was different in MPI version from the sequential one. It took me a couple of years (I kid you not) to convince my colleagues at AST that it was normal behavior, that I can not ensure the same residuals from parallel version as they get from sequential. Still, each parallel run, provided it was performed with the same number of processors and with the same decomposition was giving the same residuals each time.

7 Likes