Why doesn’t Julia fma automatically?

That’s not how fadd and fmul work, but higher-level languages have more options other than treating * and + exactly the same as those.

Changing a*b + c to not give the exact answer spelled out in excruciating detail by IEEE 754 would absolutely be a massively breaking change. The ability to implement algorithms like accurate interval arithmetic, transcendental functions, high precision types like DoubleDouble, compensated arithmetic (Kahan summation), etc. all depend on following IEEE precisely. No one implements these kinds of things in Python AI/ML DSLs, which is why they can get away with not following IEEE exactly. (Also, they generally only have vectorized array operations which are an entirely different situation, so this isn’t even a valid comparison in the first place. Accurate summation of arrays is very expensive and generally not done for that reason.) People do implement these algorithms in C, C++ and Julia, which is why non-IEEE “optimizations” are opt-in in all of those languages.

There isn’t really much point in continuing this discussion. It has been explained repeatedly why we cannot, should not, and will not do FMA automatically. You keep disregarding those explanations and insisting that we should anyway.

22 Likes

But they do implement in C? So why can C/C++ get away with it (pragmas?)? Would it not be possible to get FMA semantics by default, if in some cases it’s left out? Might need to be an exception for libraries…? To me, it just seems important to know what those cases should or could be.

1 Like

Where did this idea come from that C and C++ do FMA automatically? All major C/C++ compilers follow IEEE 754 unless you ask for -funsafe-math which is local to a compilation unit. Literally no language that can be used for implementing serious low level numerical functionality does FMA without explicitly opting into it.

2 Likes

Where did this idea come from that C and C++ do FMA automatically?

From Clang, I assume that means C and C++.

See my comment above, and where I’ve answered you on that before (I confirmed personally for Clang for RISC-V with Godbolt):

The C standard permits intermediate floating-point results within an expression to be computed with more precision than their type would normally allow. This permits operation fusing, and Clang takes advantage of this by default.

I believe Clang has an exeptions to its new default for x86… at least for now.

Julia doesn’t have such, except well packages (precompiled by now), why I proposed a possible exception for them/libraries.

1 Like

Actually, see the -ffp-contract option in GCC or Clang. Clang switched to enabling it by default relatively recently, while GCC always had it on-by-default, I suppose. I think that only works in a single expression, though, so presumably it doesn’t work across statements, which is why it wouldn’t cause too much harm.

3 Likes

I agree with you in spirit

however, I (think, could be wrong) that OCaml does automatic FMA on hardware that supports it

3 Likes

Yes, the default floating point model in clang also permits FMA since 2020 as @Palli described. I’m also not fully convinced that it’s a great idea. The idea that the math you write in a programming language represents some abstract mathematical truth and “just does what you mean” is so very seductive but I really don’t think it’s meaningful in a way you can count on because there’s always some place that the rubber hits the road and you’re no longer abstract.

We’ve discussed this tension many times; this comment still represents my views:

6 Likes

I’m not saying I’m convinced either, just that there are bad results with doing either. I just know other people were convinced enough to change the default in Clang.

When I ask for a*b + c, then I didn’t specifically ask for rounding twice or even once, explicitly. With integers (or rationals) that does NOT happen. Julia wants to be generic, and rounding needs to happen with floating point (unless you widen…). It seems very plausible that rounding only once is the new better default.

What Julia could to is define new types. Float64_no_FMA etc. and Float64 (changed to mean float allowing FMA, one rounding). Would that be an idea people like? The representation would be the same, so a “conversion” between a “noop”. I so like type systems and Julia to explore such options.

1 Like

Apparently, this ultimately stems from C99, which added an FP_CONTRACT pragma that you can use to specify whether automatic FMA is allowed:
image
and the standard explicitly says (in section 7.12.2) that The default state (‘‘on’’ or ‘‘off’’) for the pragma is implementation-defined, which gave compilers explicit permission to fuse by default (but technically only within expressions that appear explicitly in the source code, not across inlined or merged expressions, I think).

For a long time the default was “off” in gcc unless you used -funsafe-math-optimizations or -ffast-math, but apparently it was changed around 2010 to default to -ffp-contract=fast (allowing automatic FMA on supporting hardware); however, the FP_CONTRACT pragma to disable it isn’t supported and this was “fixed” by switching to -fp-contract=off in ISO C mode. clang documented a change in the default to -ffp-contract=on in clang 14, but apparently it didn’t actually change on many platforms until 2021, and apparently they still don’t fully support the pragma to disable it. In MSVC the default is off as of Visual Studio 2022 version 17.0 (but was on by default in previous versions). Intel icc enables -fp-contract=fast by default, but they also support the pragma. So, it’s a mess right now and you can’t reliably indicate in the source code itself (as opposed to a compiler flag) that FMA contraction is not desired, thanks to inconsistent support for #pragma STDC FP_CONTRACT OFF.

One thing to keep in mind is that, even though this is a potentially breaking change, it was a change made with little fanfare by multiple compilers (albeit explicitly allowed by the language standard). So, there is some precedent if one wanted to make a similar change in Julia 1.x (presumably adding something like a @strict_math macro for code to opt out).

19 Likes

I’m well aware that these algorithms exist. I think that if we changed FMA behavior, it would make sense to increment the major version to make sure the code that relies on this got the memo. I just think it’s a bad idea to make a promise that very few (if any) other languages do. As noted above, C/C++ does not make this promise, making Julia in this respect a lower-level language than C/C++. I also think it’s very bad practice to rely on this kind of exact order-of-operations without explicitly specifying it in your code, either with a macro, by defining a separate exact_plus function for this, or by using parentheses.

I’d like to note that I’m not disregarding these explanations. I fully understand them, and I recognize there are some costs to this optimization in the form of making these high-precision addition algorithms slightly clunkier to express. I just don’t find these explanations convincing. They argue that someone, somewhere, would theoretically pay a nonzero cost for this optimization, not that those costs aren’t worth it. Addition algorithms like Kahan summation are a tiny sliver of code in the Julia ecosystem that could easily be fixed by slapping an @no_fma macro at the top of the functions that implement them, instead of requiring the other 99.9% of Julia code to remember to either remember to include @fma or accept the accuracy and performance hit.

The way defaults are supposed to work is that they’re supposed to work for most people, with a way to opt-out for the few cases where it doesn’t work, not the other way around.

5 Likes

this might land with more gravity if you are volunteering yourself to contribute such an easy fix on that tiny corner of the ecosystem

One of the packages the I maintain (KernelAbstractions.jl) used to do fp contract by default (with the argument that it improved GPU performance and most GPU compilers do fp contract by default).

I received multiple real bug reports were this caused surprising and hard to understand behavior. The funniest was one were the same expression was evaluated to two different results within the same function.

I am a strong advocate for global behavior that is safe and consistent, with the local ability to opt-in to more aggressive behavior.

A real step forward would be to expose a much more fine-grained set of options for the fastmath macro so that you could choose to only get :contract and :reassoc

24 Likes

Here’s a fun real world example from Clang’s world: using FMA means that x1*y2 - x2*y1 is sometimes nonzero when x1==x2 and y1==y2.

14 Likes

This macro makes it very easy to apply @muladd at a function level. That’s syntactic, understood, and safe. In many cases, that’s going to be a much simpler option to reason about.

16 Likes

Sure, I’d happily make a PR if the Julia devs announced a plan to optimize FMA automatically.

And that one causes a*conj(a) to potentially have nonzero imaginary part if complex multiplication uses FMA. Another example is that b^2 - 4 a c will be nonnegative when b^2 \geq 4 a c if the obvious operations are used. However fma(-4*a, c, b^2) can still be negative, which can be inconvenient when using the quadratic formula. It’s not just problems with compensated summation or Double-Doubles.

It’s reassuring to see some resistance to the idea.

9 Likes

Yeah, that’s exactly it. I mean I get it, this sure feels like it should be such an obvious win — better performance and more accurate calculations, what’s not to like!? I want this to work.

But it’s not uniformly more accurate. Getting “more accuracy” sporadically in unexpected places can lead to very baffling — less accurate and less algebraic — behaviors once you go just one step beyond the trivial a*b + c.

16 Likes

Thank you for your contribution :slight_smile: I think your post is the first one to bring up an actual case of these kinds of bugs causing problems in real life. I’m not sure if there might be something unique to the usual uses of KernelAbstractions.jl here, but if it caused problems there, that suggests it might not be a good choice for Julia as a whole.

Were the bugs all caused by people writing things like a*x + b*y, and not simple cases of a*x+b? If so, would it make sense to just stick to the basic/case easy win a*x + b = fma(a, x, b), but just leave a*x + b*y alone?

the problem is that compilers don’t work one line at a time. You basically only have 2 options: allow re-association and contraction or prevent them.

4 Likes