Bicgstabl from IterativeSolvers.jl gives different result every time

Hi all,

I am using Julia as a framework for development and testing of discretization methods in computational fluid dynamics and as a part of it, have to solve linear system of equations for velocities and pressure pretty often. Due to the fact I am eventually aiming at dense grids giving large linear systems, iterative solvers from the package IterativeSolvers.jl seemed like the optimum choice for me.

However, at one point I noticed that the program I developed gives different results every time. No matter if I run it from terminal or from REPL. The differences are small, but still big enough to put that splinter in your mind that you might have failed to initialized all the variables in your code. After a thorough analysis, I realized that the source of randomness in my code is the bicgstabl solver from IterativeSolvers.jl

Please have a look at an MWE:

using IterativeSolvers, SparseArrays, LinearAlgebra

n = 32

A = spdiagm(-1 => fill(-1.0, n - 1),
             0 => fill( 3.0, n),
             1 => fill(-2.0, n - 1))

Id = sparse(1.0I, n, n)
A = kron(A, Id) + kron(Id, A)
A = kron(A, Id) + kron(Id, A)
b = A * ones(n^3)

x = bicgstabl(A, b, 2, reltol=1.0e-5)

println("relative error = ", norm(b - A * x) / norm(b))

The output I get reads:

relative error = 9.393585200552226e-6
relative error = 2.364031042840345e-6
relative error = 9.663172610869453e-6
relative error = 3.716206357164069e-6

The relative error I get is invariantly smaller than the tolerance I set - which is good, but it is also different on every run - which is quite bad because it casts a shadow of a doubt on the code you developed around the solver.

A way forward in my case would be to change bicgstabl from IterativeSolvers.jl with the default \ operator from SparseArrays.jl, which redirects to umfpack solver and gives the same residual on every run. Another path would be to squeeze the bicgstabl tolerance down until is below the number of significant digits in residuals I print, and hope the errors will not accumulate more than that during unsteady simulations of coupled non-linear equations I am solving.

But, really, is this expected behavior of an iterative solver? Living with an approximation - yes, but living with a random outcome on each run - no! It is certainly not something I have seen ever before in a linear solver and I would assume it is an indication that some arrays are not initialized properly in the package.

Any thoughts on that?

Cheers

I’m sure domain experts will weigh in, but from what I can remember about numerical algebra from graduate school twenty years ago - yes, it could certainly make sense to me that an iterative solver would converge to slightly different numbers each run, within tolerance.

Two possible ideas why:

The algorithm might be initializing internal matrixes randomly, and thus you would not expect bit perfect or even machine precision for each run.

Also, discrete linear algebra does not commute,and the LLVM compiler might be reordering instructions differently each run, especially if multiple cores and parallel BLAS is used.

All your tolerances are ~1e-6, so my intuition is that the iterative method is working as intended.

Look for rand here : IterativeSolvers.jl/bicgstabl.jl at 02b577bb049751d3ddbb69fd1ac7626b5d75b696 · JuliaLinearAlgebra/IterativeSolvers.jl · GitHub

You can fix that by calling Random.seed! before bicgstab. However nondeterministic variability of results is pretty much a fact of nature in complicated scientific code because of the non exact arithmetic, and you’re probably better off making peace with it. You just have to check any noise is below your tolerances.

2 Likes
julia> using Random

julia> function run(seed=1234)
           Random.seed!(seed)
           n = 32
           A = spdiagm(-1 => fill(-1.0, n - 1),
                       0 => fill( 3.0, n),
                       1 => fill(-2.0, n - 1))
           Id = sparse(1.0I, n, n)
           A = kron(A, Id) + kron(Id, A)
           A = kron(A, Id) + kron(Id, A)
           b = A * ones(n^3)
           x = bicgstabl(A, b, 2, reltol=1.0e-5)
           println("relative error = ", norm(b - A * x) / norm(b))
       end
run (generic function with 2 methods)

julia> run()
relative error = 2.95148504377994e-6

julia> run()
relative error = 2.95148504377994e-6

julia> run()
relative error = 2.95148504377994e-6

julia> run()
relative error = 2.95148504377994e-6

1 Like

Wow man - thanks - this is an awesome fix!

However, I still believe that we should avoid “nondeterministic variability” at every possible step in our simulation programs and function which returns different results every time is sloppy programming. Yes - I deal with simulations of turbulence for three decades now and - yes - I understand what does chaotic means. But I still don’t fancy randomness during development and unit testing.

Imposing deterministic results in parallel programs is very hard. It is not sloppy to be aware of the error model of your computer and programming to it. Of course it’s nice to avoid variability when you can but it’s a losing battle so you might as well just give up and embrace it. The way I think about it is that every floating point operation on a computer produces a random relative error of size eps(). Similarly I think of solvers as giving a random result whose residual is smaller than the tolerance I give it. It’s not completely accurate, but it’s a good mental model.

1 Like

Parallel programs are a different issue. But picture my situation. I developed a new simulation program in Julia to check if the concept of a new discretization scheme I devised works. It does. I am happy. Next thing, I am profiling and optimizing Julia to turn the concept into productive simulation tool, thus using one of its main strengths - avoidance of the two language approach. As I fine-tune the code, I change it. Each change has a potential to introduce a bug. In order to make sure I do not introduce a bug, I run small tests and unit tests. Some bugs fail miserably and are either noticed by the compiler or one of the assertions I use. But some bugs are subtle, introduce very small differences whose impact you could see only after long simulations. But, if one component (linear solver in this case) always gives me slightly different results, I can not go forward at all as I am completely oblivious if it was me who introduced the small difference, or is it the solver. Hence, on its own, without your fix, bicgstabl from IterativeSolvers.jl is a rather bad choice for tuning a code for performance and I hope that my question, and your answer in particular, will help someone not to fall in the same trap I did. Thanks again and cheers.

one possibility in the case of testing is using a deterministic linear solver. You can perhaps decrease the mesh size to allow the use of \ and then pass the linear solver to your program as an argument: linsolve = (o,A,x) -> o = A \ x (or better with ldiv!)

1 Like

You can absolutely check the correctness of numerical schemes without byte-by-byte reproducibility. Any small modification of code (refactoring, optimizations, etc) can introduce slight differences in behavior, just because it causes the compiler to reorder operations or some such. You just have to check that this does not impact the final results (eg by having very tight tolerances for solvers, and checking the end results up to a precision much larger than the tolerances in the solvers). I do agree that it’s nicer to make these changes if you do have complete reproducibility, but it’s not necessary, and insisting on reproducibility limits the range of things you can do.

1 Like

@stevengj might disagree with this, see https://math.mit.edu/~stevenj/18.335/fp-myths.pdf :wink:

Using randomness in an iterative linear solver does seem a little shady to me, not least because generating randomness is quite expensive. For example:

julia> A = Tridiagonal(rand(n-1),rand(n),rand(n-1))
       b = rand(n)
       @btime $A*$b;
  1.837 μs (1 allocation: 7.94 KiB)

julia> @btime rand(n);
  1.600 μs (1 allocation: 7.94 KiB)
1 Like

This statement confuses me. Most parallel scientific codes are deterministic in my experience (excluding ones that use rand with a variable seed, of course)—you get exactly the same results if you fix the number of processors and re-run the same software on the same inputs. Usually non-deterministic behavior — race conditions — indicates a bug. It’s possible to write correct parallel programs that have intentional race conditions and non-deterministic behavior, but this is often tricky and not that common in scientific computing as far as I know.

(One example that would be non-deterministic is code using a parallel work queue and summing the results in the order that they are completed. But typical MPI-style programs using a static data distribution don’t usually follow this pattern.)

If you change the number of processors, the ordering of your operations may change and you may get different results (e.g. due to lack of associativity in floating-point arithmetic); similarly if you change compilers or other software versions. But I wouldn’t call this “non-deterministic” behavior.

(FFTW is an exception here, because if you run it in MEASURE mode it uses actual timing measurements to choose its algorithm, in which case the choice of algorithm, and hence roundoff errors, can vary from run to run. But you can easily eliminate this variability by using a saved plan or ESTIMATE mode.)

2 Likes

@ettersi of course it’s wrong, but it’s a useful simple mental model to have when looking at results, no?

Regarding the random numbers in bicgstab, I’m not sure where it comes from either. It’s not for the initial guess, since that is zero by default (as it should be). It’s only generated once though so performance impact is not that bad.

@stevengj hm that’s an interesting point. Doesn’t a reduction with a dynamic scheduler give non deterministic results?

I guess I should make the caveat that I’m thinking of MPI-style distributed-memory parallelism. The MPI standard strongly urges implementations to produce deterministic results from reduction operations (sums etcetera), even though it is not strictly required.

Yes, OpenMP multithreaded parallelism makes no guarantees about the operation order in parallel reductions, so it can yield non-deterministic outputs, though there have been some efforts to provide a deterministic option.

I’m not sure about Julia’s multithreaded reductions?

Yes, I was thinking of openmp and also of ReproBLAS - Reproducible Basic Linear Algebra Sub-programs, which seems to suggest standard blas is not reproducible.

It sounds like they are targeting a much stronger sense of reproducibility — they want reproducible results across different hardware and “regardless of the number of processors”. This is much harder than deterministic results on the same hardware with the same number of threads.

For example, my impression is that most BLAS implementations, e.g. OpenBLAS and MKL, use a static schedule for multi-threaded code, so they give the same results on the same hardware with the same number of threads, though they may give different results when you switch hardware (due to different SIMD instructions), change the number of threads, or maybe if you change your array alignment.

(e.g. I’m trying 5000x5000 matrix multiplications now on my machine with 4 threads and OpenBLAS, and it is always producing exactly the same results.)

PS. Apparently there are still some corner cases where OpenBLAS is non-deterministic, but they are viewed by the authors as bugs: Non-deterministic output with multiple OpenMP runtimes in the same process · Issue #2146 · xianyi/OpenBLAS · GitHub

Thinking of floating-point arithmetic as non-deterministic seems sufficiently wrong that it leads people to make weird programming choices.

1 Like
2 Likes

This seems like an interesting topic to delve in deeper! I opened a new topic on this here.