Julia 1.11.1 gives different results from 1.10.5

Still a fair question: is the basic Julia arithmetic on floats known to produce different bits patterns, 1.10 vs. 1.11?

to be honest, I just kind of assume that arithmetic on floats is allowed to produce different bit patterns at any time :joy: depending on the exact commit version, platform, temperature of cpu, whimsy of RNG gods. and that’s not Julia-specific, I mean for anywhere.

I know that’s not accurate but seems like a useful heuristic for a pleb like me… this is why tolerance bounds in tests are useful!

23 Likes

If I were a betting man, I’d put my money on the difference coming from higher order routines in BLAS or LinearAlgebra or perhaps a reduction. Very few computations of that ilk give you perfectly rounded results in the first place, and they are liable to change.

I imagine you’ll figure it out now in relatively short order just by stepping back through the computations that generated that matrix. Will be interesting to hear what it was!

1 Like

Thanks. Not an anwer to my question, I’m afraid.
Forget arrays, vectors, packages. Is 1.11.2 expected to produce identical bits for things like 0.1 + 0.3 * 0.4 as 1.10.5?

Yes. It only might change if you use @simd or @fastmath or muladd or the like… which is exactly what BLAS and LinearAlgebra and reductions do. And then if they don’t guarantee perfect rounding in the first place, the implementations of these more complicated computations themselves might also change.

10 Likes

A.
I would start with running your code with -O0 on 1.11.1 and where it was working, was it 1.11.0 or 1.10.x? And if you’re using threading, disable that next, also for BLAS.

If you still have a problem after disabling the optimizer, then you’ve ruled out it the problem, and then (updated) LLVM a likely issue.

Are you sure the 10.0 is the correct one, and not the 9.9999… after? Bother are mathematically the same, assuming the latter infinite, which it isn’t, rather a goo approximation of 10.0.

muladd means fma on most (i.e. recent) platforms, and fma is actually deterministic too. Assuming the fmas were done in the same order, and I don’t think the optimizer is allowed to change it by default, then you can rule out fma/muladd as the problem?!

This is most likely related to @fastmath or equivalent in BLAS (aren’t you safe with single-threaded BLAS?). Unlike C++ [compilers] fma is not opted into for you behind your back, and I believe that’s till the case with Julia with whatever LLVM, but might I be wrong on that for recent version used?

SIMD processing, per se isn’t the problem, except @simd implies [EDIT: no, but something close; see also comments from experts on my answers] @fastmath I believe.

I think, but am not sure, that -O0 (and likely -O1 too) disables the likely problem @fastmath (which is a local optimization), if not, is there some other global switch to do it, analogous to --check-bounds=no (which you could also do) to make sure you have a correct program?

B.

Not so.

You only need a working solution, in 1.10.5 here seemingly, then a non-working (needs not be minimal) code with (deterministically) wrong/other solution, and then you can bisect to the problem automatically.

I’ve never used git bisect personally but it’s a rather neat feature. Though ok, it might be construed to be debugging, without a debugger [program/activity].

That would be helpful for us humans here to spot the error, but not needed, with git bisect:

No, assume deterministic answers always, if you do not have threads. The heat of the CPU shouldn’t matter, unless it’s broken, or as a loophole it can happen with threads.

All the floating-point math is deterministic in the CPU, all assembly instructions are. Versions of software can’t change that, except if the optimizer/compiler opts into a different sequence of those (floating-point) operations.

All operations in IEEE float math should be correctly rounded. At least the basic ones. If I recall, some, log(?), and sin will not be, but they will be deterministically wrong, and I believe avoided by Julia, compilers to software solutions for broken instructions (also because sometimes it’s faster). Across CPUs they could get better (more) correct) answers (or with exceptions, wrong ones as with Intel’s famous Pentium FDIV bug - Wikipedia, which is not your problem, since it’s quite old).

3 Likes

the “except if” here is doing a LOT of heavy lifting. that’s pretty much always how it happens, and it happens all the time.

1 Like

No, you can’t reorder floating point, it’s just not allowed (in general, ok to reorder in some cases though, I believe), so the optimizer doesn’t do it when not allowed, unless you (or your package [dependency]) instructs it to do such, as I explained. And even if you reorder, often you get away with it, just not with bit-identical answers. So was the answer such, still ok as an approximation?

@Palli, that post is a wild mishmash of misinformation.

This is not correct. muladd allows for other contractions with nearby muladd’ed computations beyond the straightforward fma.

No, BLAS performs many re-associations in single-threaded mode based upon cache line length and the like. And its implementations might change between versions.

No, it does not. SIMD does allow for arbitrary reassociations, however, and those might change between versions.

I don’t think optimization modes “turn off” @fastmath or fp-contract, but they might change some of the re-arrangments in yet more exciting ways. I wouldn’t speculate on this one; and testing it is likely only going to confound things.

We’re talking about changing Julia versions here. The answers within each Julia version should indeed be deterministic if they’re written to be deterministic (and it seems they are), but this is about what might change between versions. That’s why I’m focused on higher-level computations that themselves don’t promise perfectly rounded answers and thus might change.

10 Likes

Thanks, you’re referring to “or with surrounding operations for performance” in muladd’s docs. @mbauman It IS what the docs say, yes, I however disagree those are good semantics, also unsure this ever happens in practice. Rust’s MulAdd in num_traits::ops::mul_add - Rust has good semantics, i.e. without such, at least documented, and I think we should too (it wouldn’t be a breaking change).

fma is completely deterministic(?!) if I’m reading the docs right, or if not then the help for it should be improved. It’s slower on hardware without the instruction, and I assumed muladd was the same except allowing skipping the software implementation when the instruction absent. That would at least be the semantics I expected, and would want. I’ll try to fix my post, I want to learn from this and want to know for sure to not misinform.

I was trying to warn against @simd too, both do “fast” math in a colloquial sense, with different semantics. I wasn’t trying to say either stable across versions, rather the opposite, absence of them gives deterministic code (for pure Julia code excluding BLAS…, meaning really only guaranteed for scalars).

Also threading does make BLAS less deterministic, I understand, so good to know to test that theory, also good to know you can’t get fully deterministic even with single-threaded BLAS option (except within a 1.x.y version for same x). Then it might be one more reason to drop BLAS as a default dependency of Julia…

fma is deterministic, it’s muladd that isn’t. muladd lowers to a multiply and add that are allowed to (but not required to) fuse with eachother.

1 Like

Do you mean muladd is deterministic for all fma-supporting platforms AND it’s also deterministic on all non-fma supporting platforms (what I was trying to say) but just not deterministic across those two categories? What I was trying to explain. Then muladd is ruled out as the cause (assuming he used the same CPU).

[fma docs are ok (and muladd’s, I just disagree on the best semantics there), I dropped the post above on it, I had misread a bit.]

Yes, this recalls the Java wars of the 1990s. The great Bill Kahan had argued that Java should NOT guarantee bit-identical floating pt operations. He lost that argument, because it was thought that write-once run-anywhere needed to be identical. As he predicted, this eventually blew up their faces because just too slow. So since Java 1.2 the default (EDIT: was made to be) NOT bit-identical, and they introduced strictfp (1998) EDIT: Since reverted to strict! thanks @Palli for correction!. Hardware and software evolve, new devices like phones appear, etc.

My impression is that Julia tries to guarantee approximate equality only. The existence of @simd and fastmath aren’t meant to imply exact equality otherwise.

The question for @PetrKryslUCSD is whether the “broken correctness” is only a matter of not bit-identical, or something deeper and worse. Perhaps the release notes for 1.11 (or maybe BLAS?) could mention that math is slightly different from before. I agree with the sentiment that strict is helpful for testing, but even though strictfp allows you to run all your old strict tests, you’d still need to write separate approximate ones.

I would appreciate corrections to any of this, I am only an amateur.

1 Like

no. I meant what I said. muladd has the flexibility to fold or not fold on all hardware whether or not the hardware supports fma.

Sorry, but why is everybody embarking on the assumption that 1-bit difference in the 1.0 float is what is causing the difference between 1.10 and 1.11?

1 Like

Note:

strictfp is an obsolete and redundant reserved word in the Java programming language. […] As of Java 17, IEEE 754 semantics is required, thus using this keyword has no effect.

C++ is also adopting BLAS from C++26: Basic linear algebra algorithms (since C++26) - cppreference.com

It’s rather hard (but not impossible, then use internal non-exported functions…) to avoid BLAS in Julia. I believe it can and should be a supported option, well even the default (BLISBLAS.jl is better anyway). LinearAlgebra is, since days ago, in a separate repo. It’s still a stdlib. I.e. going forward you can have the same version of that stdlib (and [Open]BLAS I assume), helping with bit-dentical across Julia versions.

Thanks, corrected! Haven’t used Java in a while…

This is going sideways. The class of problem that @PetrKryslUCSD is seeking empathy for and understanding of is a change in numeric result between Julia versions. Importantly, we don’t know why it happened. Fastmath debates are always fun, but even without knowing why the numeric values changed, we can look at this from a slightly higher level: a floating point computation is either perfectly rounded for all inputs or it is not. My very strong suspicion is that Petr’s original problem stems from a computation that wasn’t and likely still isn’t perfectly rounded for all its inputs. We will see if that bears out. It might have changed results because the Julia compiler re-associated something differently, or it might have changed because the implementation itself changed.

Let’s use the concrete example Petr gave above — yes, 0.1 + 0.3 * 0.4 will always give you the same result, but what if you’re actually interested in adding and multiplying tenths? You can make it into a first-class function:

julia> mul_and_add_tenths(x,y,z) = x/10*y/10 + z/10
mul_and_add_tenths (generic function with 1 method)

julia> 0.3*0.4 + 0.1
0.22

julia> mul_and_add_tenths(3,4,1)
0.22

Great! But… there are other values for which this seems… wrong? But it is what we get with the floating point literals, too:

julia> mul_and_add_tenths(1,2,1)
0.12000000000000001

julia> 0.1*0.2 + 0.1
0.12000000000000001

We’re humans though, and we see tenths, and we want to work with exact tenths, and we’re passing exact integers into our function. We want these answers to round how we expect! In fact, there are many such inputs for which we get not-perfectly-rounded answers:

julia> errors(f) = count(f(x,y,z) != Float64(x//10*y//10+z//10) for x in 1:10, y in 1:10, z in 1:10)
errors (generic function with 1 method)

julia> errors(mul_and_add_tenths)
257

OK, so let’s make it better. We can simplify the computation a bit with some algebra — improving its speed and improving its rounding at the same time!

julia> mul_and_add_tenths2(x,y,z) = (x*y/10 + z)/10
mul_and_add_tenths2 (generic function with 1 method)

julia> errors(mul_and_add_tenths2)
246

julia> mul_and_add_tenths2(1,2,1)
0.12

That worked! We reduced our number of inputs for which we get not-perfectly-rounded answers and we fixed that test-case! But wait…

julia> mul_and_add_tenths2(3,4,1)
0.22000000000000003

Augh, we regressed the case we first cared about!

So: if we had such a function in Julia, should it be allowed to change its results between Julia versions like the above? Surely that’s breaking its “correctness”! A correct answer became wrong! What if we changed it again to be:

julia> mul_and_add_tenths3(x,y,z) = (x*y + z*10)/100
mul_and_add_tenths3 (generic function with 1 method)

julia> errors(mul_and_add_tenths3)
0

None of the above implementations touched fastmath or simd or fma or muladd, but it’d sure seem reasonable to iterate on its definition between versions like the above because they were net improvements.

23 Likes

Excuse me for not adding much technical content, but any reasonable optimization problem and solution should be robust enough to manage slight floatingpoint errors.
Furthermore, using floatingpoint numbers is essentially like adding nondeterminism to any reasonably big program, so aiming for bit-equivalent runs is outside the “spec”.
So, to address the OP, the actual correctness problem still needs to be found, in Julia, the libs, or the problem code.

I appear to be on the track of a major incompatibility: The partitioning is very different in the two julia versions. I am guessing that dictionaries (and/or sets) use different hashing functions?

The different partitioning of the domain has a big effect on the ill conditioned matrix that is assembled. Apparently enough to cause stagnation in one case and convergence in the other.

Edit: Dictionaries are NOT at fault. Turns out I RANDOMIZE the partitioning myself for nicer graphical output… I just got to make randomization optional, that’s all… Still, something changed between 1.10 and 1.11. Perhaps different random seeds? Perhaps a different default random number generator?

Edit 2: Perhaps the problem is that I did not set the random number generator seed. Therefore the two Julia versions might have diverged by the time they got to executing my code.

Edit 3: It is not the randomization.

7 Likes