# Operation yields different results on other machine

We’re running the same code on machine A and machine B (same Manifest file), and the results are not the same. We’re doing fairly straight-forward linear algebra on StaticArrays. Is that a bug, or is it generally accepted that the implementation of these algorithms (LAPACK etc.) can vary from one machine to the next?

1 Like

What kind of machines are they?

• OS
• CPU architecture
• Julia version

I’m not sure; I’m asking more from a point of principle, to figure out if I want to spend time looking for the difference. Is julia generally guaranteeing that `f(x)` will return exactly the same value across architecture/OS, for its floating point operations? Is there any function in Base where that’s not true? There is the `Int = Int32/Int64` split which can cause a difference, and obviously multithreading can vary if we didn’t write our code right, but for single-thread code is there anything else?

EDIT: to be clear, I’m talking about a 1e-8 difference in the final result of a calculation. It is probably a very small floating point difference, that was compounded over the large number of operations we perform.

It won’t generally be true if you have `@simd` or `@fastmath`.
Optimal performance for some operations (e.g. reductions) will require different orderings on different machines, and this can change rounding.

For example, on a machine with AVX512, starting Julia normally (this will use 256 bit vectors, as preferring them is the default):

``````julia> using Random; Random.seed!(2);

julia> sum(rand(128))
59.893130742026266
``````

Starting Julia with `-C"native,-prefer-256-bit"` (this means “do not prefer 256 bit vectors”, which will cause it to use 512 bit vectors):

``````julia> using Random; Random.seed!(2);

julia> sum(rand(128))
59.89313074202627
``````

Starting with `-C"native,prefer-128-bit"` (do prefer 128 bit vectors):

``````julia> using Random; Random.seed!(2);

julia> sum(rand(128))
59.89313074202625
``````

Aside from vector width, the presence/absence of fused multiply add instructions will also make a difference. These instructions return the correctly rounded answer (i.e., not rounding the intermediate product and then rounding again after adding).
`muladd(a,b,c)` will normally (but not always) use fma instructions when available, and when unavailable will instead compute the less accurate `a * b + c`.
`Base.exp` is much less accurate on systems without FMA instructions. As an aside, VectorizationBase/LoopVectorization’s should be similarly accurate whether or not FMA is available, as it switches to double-double arithmetic in that case to compensate; this means that in practice FMA instructions can sometimes provide a far larger performance than just the 2x implied by just combining an addition and subtraction. This would be even more extreme if using `Base.fma` instead of double-double arithmetic, as `Base.fma` imposes an even larger performance penalty.

14 Likes