Numerical differences depending on loading order

We are doing large-scale numerical simulations and discovered a reproducibility issue with our numerical code, where the output was different depending on whether we were producing plots on the fly or not. After an “exciting” afternoon of tracking down the problem, we boiled it down to the following minimal example.

(run in an environment where only the Colors package v0.12.10 is installed)

> julia --optimize=1 --project=. -e 'c = 0.46309504630950465; display(cos(c)); display(sincos(c)[2])'
0.8946741679895334
0.8946741679895334

So far, so good.
But if we load the Colors package:

> julia --optimize=1 --project=. -e 'c = 0.46309504630950465; using Colors; display(cos(c)); display(sincos(c)[2])'
0.8946741679895334
0.8946741679895333

This doesn’t happen if the sincos function is used before loading Colors:

> julia --optimize=1 --project=. -e 'c = 0.46309504630950465; sincos(0.1); using Colors; display(cos(c)); display(sincos(c)[2])'
0.8946741679895334
0.8946741679895334

With default optimization, we always get

> julia --project=. -e 'c = 0.46309504630950465; display(cos(c)); display(sincos(c)[2])'
0.8946741679895333
0.8946741679895333

which is different from optimize=1 but at least consistent across the two functions and does not depend on whether Colors is loaded or not, or whether sincos is used before using Colors or not.

I couldn’t find anything obvious in the Colors package that overwrites methods of cos or sincos, it just uses them.

A number of questions arise:

  1. First the most critical question: Why does the sincos computation depend on whether a package was loaded before its first execution or not? How can we avoid such things for potentially other functions. This is a huge reproducibility issue for us if we cannot guarantee that the same basic math method called in the same environment always returns the same values (assuming, of course, there was no type piracy or methods overwritten by loaded packages).
  2. Is it expected for this code to return different numerical values depending on optimization level? For very agressive optimization, I could imagine that the compiler makes wrong assumptions about side effects, etc, and thereby changes results, but here, just calling two very basic math functions with standard (or lower) optimization, I would have expected no differences.
  3. Whatever happens in Colors.jl, isn’t it inconsistent that that effect can be avoided by calling the function before using Colors? This makes it seem that this has to be quite a low-level issue (compiler, optimization, etc.).
  4. Is it expected for cos and sincos…[2] to return different values at all?

Colors.jl uses some @fastmath, etc., but merely using it shouldn’t change the results we get for the standard function without @fastmath, should it?

In any case, after tracking it down to this basic level, we are a bit out of our depth on how to continue, so we would appreciate any input from the community. If any additional information is needed, please let us know.

Julia Version Info
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 12 × Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
Threads: 1 on 12 virtual cores
9 Likes

It’s hard I suppose to make a complex process such as compilation be deterministic in a modern setup. Combine this with the truly knife-edge determinacy of floating point and it becomes Herculean to make a deterministic decently large program. Sustainable solutions to this problem can be:

  1. Round any floating point to desired accuracy and make sure algorithm converges to beyond this accuracy despite numerical errors.
  2. Make sure no compilation done during program execution, i.e. precompile every method signature in a hopefully more deterministic environment beforehand. If an image can be made, it will really boost reproducability.
  3. Avoid floating point and accept that all that sweet CPU support will go to waste.

This comment is not very technical, but my 2¢.

ADDED: reproduced on Julia Version 1.9.4

Just a surface-level check. Looking at trig.jl in base Julia, cos depend on sin_kernel and cos_kernel while sincos depends on sincos_kernel, which just does sin_kernel and cos_kernel itself. Colors.jl defines sin_kernel and cos_kernel in utilities.jl but it seems isolated to Colors and Colors.sincos360 and shouldn’t affect Base.

Exactly, it doesn’t seem like it should affect anything. And even if it did in some hidden way, it should always change the behavior, no matter whether I have called sincos before or not, via method invalidation. We are at the top level in a script here, so world-age problems don’t exist (and I have checked that the behavior persists even with invokelatest).

Colors.jl has a precompile file, and the convert part uses sincos (cos is used in the same file conversions.jl for 1 method but it didn’t seem like the types were involved in the precompile file). Is it possible that using Colors is somehow loading a cached --optimize=2 version of sincos, and it is only preventable if sincos was already compiled? Seems odd because I would assume that precompile caches wouldn’t mix up optimization levels if they were allowed to differ; are they even allowed to differ, should this prevfloat difference be considered a bug? Latest release of Colors.jl was before v1.9 and the big changes in precompilation, but I’m not sure how that’s important.

2 Likes

FYI: I can’t reproduce (on Linux):

julia +1.9.4 --optimize=1 --project=. -e 'c = 0.46309504630950465; using Colors; display(cos(c)); display(sincos(c)[2])'
0.8946741679895334
0.8946741679895334

I did get your result with 1.10.0-rc1 but not 1.10.0-rc2.

I do though get 0.8946741679895333 with 1.10.0-rc2 and 1.9.4 if I skip --optimize=1. I’m not sure which value is (more) correct. It’s likely allowed to differ depending on optimization (but not load order? thus fixed).

Seems odd because I would assume that precompile caches wouldn’t mix up optimization levels if they were allowed to differ;

I remember reading somewhere that it is legal to reuse a cache file that was generated with a higher optimziation level than that of the current session, but not the other way around.

2 Likes

Found it mentioned in an issue after I wrote that comment:

https://docs.julialang.org/en/v1/devdocs/pkgimg/#Flags-that-impact-package-image-creation-and-selection

1 Like

Thanks for finding this, the precompile caches are an interesting idea. If it’s that, then I guess the difference should be gone when starting with fresh depot and precompiling everything with --optimize=1? I will try this. And yes, the basic question remains: Are these results even allowed to differ between optimization 1 and 2?

EDIT: Thinking about it, allowing reuse of caches from higher optimization levels while at the same time allowing numerical differences seems at least a bit incompatible to me, so I hope (for the sake of reproducibility) that this is a real bug…

Even before verifying that it’s a optimization level cache discrepancy, you could open a Github issue about this and someone more familiar with precompilation caching might run across it and identify the problem quicker. Link it back here if you do, so we can also read how that discussion goes.

1 Like

Another clue, I’ve managed to reproduce this on 1.9.4 with just cos and no libraries loaded, just different running methods (once in the REPL, and once through the debugger):

function my_cos_kernel(y::Float64)
    y² = y*y
    y⁴ = y²*y²
    r  = y²*Base.Math.@horner(y², Base.Math.DC1, Base.Math.DC2, Base.Math.DC3) + y⁴*y⁴*@Base.Math.horner(y², Base.Math.DC4, Base.Math.DC5, Base.Math.DC6)
    half_y² = 0.5*y²
    w  = 1.0-half_y²
    w + (((1.0-w)-half_y²) + (y²*r))
end

This is a non-library version of the cos function.

julia> using Debugger

julia> my_cos_kernel(0.46309504630950465)
0.8946741679895334

julia> @enter my_cos_kernel(0.46309504630950465)In my_cos_kernel(y) at REPL[1]:1
 1  function my_cos_kernel(y::Float64)
>2      y² = y*y 
 3      y⁴ = y²*y²
 4      r  = y²*Base.Math.@horner(y², Base.Math.DC1, Base.Math.DC2, Base.Math.DC3) + y⁴*y⁴*@Base.Math.horner(y², Base.Math.DC4, Base.Math.DC5, Base.Math.DC6)
 5      half_y² = 0.5*y²
 6      w  = 1.0-half_y²

About to run: (*)(0.46309504630950465, 0.46309504630950465)
1|debug> c
0.8946741679895333

As can be seen, a different value is obtained when running through the debugger. Julia started with: julia --optimize=1 --project=.. With no --optimize=1 both methods return the slow value: 0.8946741679895333

2 Likes

god that’s weird, debug> c should’ve just gone to the compiled code, and you already had it compiled in O1 to return 0.89...34

I believe the difference here is whether a muladd is turning into an fma or not.

4 Likes

Yep, narrowed it down to the @horner call, which goes to this muladd:

function _evalpoly(x, p)
    Base.require_one_based_indexing(p)
    N = length(p)
    ex = p[end]
    for i in N-1:-1:1
        ex = muladd(x, ex, p[i])
    end
    ex
end
2 Likes

I just tried this, the results are “interesting”. Indeed, if I start with an empty depot and precompile Colors.jl already with --optimize=1, I get consistent results:

> JULIA_DEPOT_PATH=/tmp/newdepot julia --optimize=1 --project=. -e 'using Pkg; Pkg.add("Colors"); c = 0.46309504630950465; using Colors; display(cos(c)); display(sincos(c)[2])'
  Installing known registries into `/tmp/newdepot`
    Updating registry at `/tmp/newdepot/registries/General.toml`
   Resolving package versions...
   Installed Reexport ────────── v1.2.2
   Installed FixedPointNumbers ─ v0.8.4
   Installed ColorTypes ──────── v0.11.4
   Installed Colors ──────────── v0.12.10
  No Changes to `~/Project.toml`
  No Changes to `~/Manifest.toml`
Precompiling project...
  10 dependencies successfully precompiled in 7 seconds
0.8946741679895334
0.8946741679895334

Trying only my initial example after that also gives …34 for both (might this be the reason that @Palli couldn’t reproduce it on some versions, because the depot was empty to start with!?). Now doing the same with higher optimization level causes everything to re-precompile (consistent with the article @Benny found) and we again get consistent (but different) results:

> JULIA_DEPOT_PATH=/tmp/newdepot julia --optimize=2 --project=. -e 'using Pkg; Pkg.add("Colors"); c = 0.46309504630950465; using Colors; display(cos(c)); display(sincos(c)[2])'
    Updating registry at `/tmp/newdepot/registries/General.toml`
   Resolving package versions...
  No Changes to `~/Project.toml`
  No Changes to `~/Manifest.toml`
Precompiling project...
  10 dependencies successfully precompiled in 8 seconds
0.8946741679895333
0.8946741679895333

But if I now repeat the --optimize=1 version, the results become inconsistent again:

> JULIA_DEPOT_PATH=/tmp/newdepot julia --optimize=1 --project=. -e 'using Pkg; Pkg.add("Colors"); c = 0.46309504630950465; using Colors; display(cos(c)); display(sincos(c)[2])'
    Updating registry at `/tmp/newdepot/registries/General.toml`
   Resolving package versions...
  No Changes to `~/Project.toml`
  No Changes to `~/Manifest.toml`
0.8946741679895334
0.8946741679895333

So somehow the --optimize=2 image gets preferred over the --optimize=1 image even though it was previously generated (perhaps the level-2 image replaces the level-1 one!?). In any case, inspired by the article linked by @Benny, I found that I can get consistent results even with an existing depot if I use the --pkgimages=no option for julia. Everything gets re-precompiled (but only once). After that, the results don’t depend on execution order anymore.

4 Likes

If I recall if you run with default or other optimization level globally, then you might force the same for packages (was that only pre-1.9?). There’s an option to set optimization level for packages, locally (and I think it doesn’t work for its dependencies), but it’s from pre-1.9, and 1.9 added native precompiled code. What I would prefer is that precompilation is NOT run again. Is that simply happening?

4 Likes