Operation yields different results on other machine

cstjean · April 16, 2021, 9:31pm

We’re running the same code on machine A and machine B (same Manifest file), and the results are not the same. We’re doing fairly straight-forward linear algebra on StaticArrays. Is that a bug, or is it generally accepted that the implementation of these algorithms (LAPACK etc.) can vary from one machine to the next?

jzr · April 16, 2021, 11:26pm

What kind of machines are they?

OS
CPU architecture
Julia version
…

cstjean · April 17, 2021, 1:22am

I’m not sure; I’m asking more from a point of principle, to figure out if I want to spend time looking for the difference. Is julia generally guaranteeing that f(x) will return exactly the same value across architecture/OS, for its floating point operations? Is there any function in Base where that’s not true? There is the Int = Int32/Int64 split which can cause a difference, and obviously multithreading can vary if we didn’t write our code right, but for single-thread code is there anything else?

EDIT: to be clear, I’m talking about a 1e-8 difference in the final result of a calculation. It is probably a very small floating point difference, that was compounded over the large number of operations we perform.

Elrod · April 17, 2021, 4:31am

It won’t generally be true if you have @simd or @fastmath.
Optimal performance for some operations (e.g. reductions) will require different orderings on different machines, and this can change rounding.

For example, on a machine with AVX512, starting Julia normally (this will use 256 bit vectors, as preferring them is the default):

julia> using Random; Random.seed!(2);

julia> sum(rand(128))
59.893130742026266

Starting Julia with -C"native,-prefer-256-bit" (this means “do not prefer 256 bit vectors”, which will cause it to use 512 bit vectors):

julia> using Random; Random.seed!(2);

julia> sum(rand(128))
59.89313074202627

Starting with -C"native,prefer-128-bit" (do prefer 128 bit vectors):

julia> using Random; Random.seed!(2);

julia> sum(rand(128))
59.89313074202625

Aside from vector width, the presence/absence of fused multiply add instructions will also make a difference. These instructions return the correctly rounded answer (i.e., not rounding the intermediate product and then rounding again after adding).
muladd(a,b,c) will normally (but not always) use fma instructions when available, and when unavailable will instead compute the less accurate a * b + c.
Base.exp is much less accurate on systems without FMA instructions. As an aside, VectorizationBase/LoopVectorization’s should be similarly accurate whether or not FMA is available, as it switches to double-double arithmetic in that case to compensate; this means that in practice FMA instructions can sometimes provide a far larger performance than just the 2x implied by just combining an addition and subtraction. This would be even more extreme if using Base.fma instead of double-double arithmetic, as Base.fma imposes an even larger performance penalty.

Topic		Replies	Views
Tests failing on Linux in Travis CI Tooling testing , travis	16	1177	January 15, 2019
Inaccurate Matrix Multiplication General Usage linearalgebra	39	3308	February 28, 2022
Same code, very different results with Julia 1.4.2 and Julia 1.5.2 General Usage question	23	2079	November 18, 2020
Matrix multiplication and vector multiplication give different results! Numerics	4	94	June 24, 2025
Matrix multiplication precision issue General Usage	13	2984	January 29, 2019

Operation yields different results on other machine

Related topics