Why doesn’t Julia fma automatically?

ParadaCarleton · October 16, 2023, 3:32pm

That’s not how fadd and fmul work, but higher-level languages have more options other than treating * and + exactly the same as those.

StefanKarpinski · October 16, 2023, 3:39pm

Changing a*b + c to not give the exact answer spelled out in excruciating detail by IEEE 754 would absolutely be a massively breaking change. The ability to implement algorithms like accurate interval arithmetic, transcendental functions, high precision types like DoubleDouble, compensated arithmetic (Kahan summation), etc. all depend on following IEEE precisely. No one implements these kinds of things in Python AI/ML DSLs, which is why they can get away with not following IEEE exactly. (Also, they generally only have vectorized array operations which are an entirely different situation, so this isn’t even a valid comparison in the first place. Accurate summation of arrays is very expensive and generally not done for that reason.) People do implement these algorithms in C, C++ and Julia, which is why non-IEEE “optimizations” are opt-in in all of those languages.

There isn’t really much point in continuing this discussion. It has been explained repeatedly why we cannot, should not, and will not do FMA automatically. You keep disregarding those explanations and insisting that we should anyway.

Palli · October 16, 2023, 4:04pm

But they do implement in C? So why can C/C++ get away with it (pragmas?)? Would it not be possible to get FMA semantics by default, if in some cases it’s left out? Might need to be an exception for libraries…? To me, it just seems important to know what those cases should or could be.

StefanKarpinski · October 16, 2023, 4:10pm

Where did this idea come from that C and C++ do FMA automatically? All major C/C++ compilers follow IEEE 754 unless you ask for -funsafe-math which is local to a compilation unit. Literally no language that can be used for implementing serious low level numerical functionality does FMA without explicitly opting into it.

Palli · October 16, 2023, 4:14pm

Where did this idea come from that C and C++ do FMA automatically?

From Clang, I assume that means C and C++.

See my comment above, and where I’ve answered you on that before (I confirmed personally for Clang for RISC-V with Godbolt):

The C standard permits intermediate floating-point results within an expression to be computed with more precision than their type would normally allow. This permits operation fusing, and Clang takes advantage of this by default.

I believe Clang has an exeptions to its new default for x86… at least for now.

Julia doesn’t have such, except well packages (precompiled by now), why I proposed a possible exception for them/libraries.

nsajko · October 16, 2023, 4:16pm

Actually, see the -ffp-contract option in GCC or Clang. Clang switched to enabling it by default relatively recently, while GCC always had it on-by-default, I suppose. I think that only works in a single expression, though, so presumably it doesn’t work across statements, which is why it wouldn’t cause too much harm.

adienes · October 16, 2023, 4:21pm

I agree with you in spirit

however, I (think, could be wrong) that OCaml does automatic FMA on hardware that supports it

mbauman · October 16, 2023, 4:31pm

Yes, the default floating point model in clang also permits FMA since 2020 as @Palli described. I’m also not fully convinced that it’s a great idea. The idea that the math you write in a programming language represents some abstract mathematical truth and “just does what you mean” is so very seductive but I really don’t think it’s meaningful in a way you can count on because there’s always some place that the rubber hits the road and you’re no longer abstract.

We’ve discussed this tension many times; this comment still represents my views:

Palli · October 16, 2023, 4:40pm

I’m not saying I’m convinced either, just that there are bad results with doing either. I just know other people were convinced enough to change the default in Clang.

When I ask for a*b + c, then I didn’t specifically ask for rounding twice or even once, explicitly. With integers (or rationals) that does NOT happen. Julia wants to be generic, and rounding needs to happen with floating point (unless you widen…). It seems very plausible that rounding only once is the new better default.

What Julia could to is define new types. Float64_no_FMA etc. and Float64 (changed to mean float allowing FMA, one rounding). Would that be an idea people like? The representation would be the same, so a “conversion” between a “noop”. I so like type systems and Julia to explore such options.

stevengj · October 16, 2023, 4:51pm

Apparently, this ultimately stems from C99, which added an FP_CONTRACT pragma that you can use to specify whether automatic FMA is allowed:

and the standard explicitly says (in section 7.12.2) that The default state (‘‘on’’ or ‘‘off’’) for the pragma is implementation-defined, which gave compilers explicit permission to fuse by default (but technically only within expressions that appear explicitly in the source code, not across inlined or merged expressions, I think).

For a long time the default was “off” in gcc unless you used -funsafe-math-optimizations or -ffast-math, but apparently it was changed around 2010 to default to -ffp-contract=fast (allowing automatic FMA on supporting hardware); however, the FP_CONTRACT pragma to disable it isn’t supported and this was “fixed” by switching to -fp-contract=off in ISO C mode. clang documented a change in the default to -ffp-contract=on in clang 14, but apparently it didn’t actually change on many platforms until 2021, and apparently they still don’t fully support the pragma to disable it. In MSVC the default is off as of Visual Studio 2022 version 17.0 (but was on by default in previous versions). Intel icc enables -fp-contract=fast by default, but they also support the pragma. So, it’s a mess right now and you can’t reliably indicate in the source code itself (as opposed to a compiler flag) that FMA contraction is not desired, thanks to inconsistent support for #pragma STDC FP_CONTRACT OFF.

One thing to keep in mind is that, even though this is a potentially breaking change, it was a change made with little fanfare by multiple compilers (albeit explicitly allowed by the language standard). So, there is some precedent if one wanted to make a similar change in Julia 1.x (presumably adding something like a @strict_math macro for code to opt out).

ParadaCarleton · October 16, 2023, 5:12pm

I’m well aware that these algorithms exist. I think that if we changed FMA behavior, it would make sense to increment the major version to make sure the code that relies on this got the memo. I just think it’s a bad idea to make a promise that very few (if any) other languages do. As noted above, C/C++ does not make this promise, making Julia in this respect a lower-level language than C/C++. I also think it’s very bad practice to rely on this kind of exact order-of-operations without explicitly specifying it in your code, either with a macro, by defining a separate exact_plus function for this, or by using parentheses.

I’d like to note that I’m not disregarding these explanations. I fully understand them, and I recognize there are some costs to this optimization in the form of making these high-precision addition algorithms slightly clunkier to express. I just don’t find these explanations convincing. They argue that someone, somewhere, would theoretically pay a nonzero cost for this optimization, not that those costs aren’t worth it. Addition algorithms like Kahan summation are a tiny sliver of code in the Julia ecosystem that could easily be fixed by slapping an @no_fma macro at the top of the functions that implement them, instead of requiring the other 99.9% of Julia code to remember to either remember to include @fma or accept the accuracy and performance hit.

The way defaults are supposed to work is that they’re supposed to work for most people, with a way to opt-out for the few cases where it doesn’t work, not the other way around.

adienes · October 16, 2023, 5:29pm

this might land with more gravity if you are volunteering yourself to contribute such an easy fix on that tiny corner of the ecosystem

vchuravy · October 16, 2023, 5:36pm

One of the packages the I maintain (KernelAbstractions.jl) used to do fp contract by default (with the argument that it improved GPU performance and most GPU compilers do fp contract by default).

I received multiple real bug reports were this caused surprising and hard to understand behavior. The funniest was one were the same expression was evaluated to two different results within the same function.

I am a strong advocate for global behavior that is safe and consistent, with the local ability to opt-in to more aggressive behavior.

A real step forward would be to expose a much more fine-grained set of options for the fastmath macro so that you could choose to only get :contract and :reassoc

mbauman · October 16, 2023, 5:47pm

Here’s a fun real world example from Clang’s world: using FMA means that x1*y2 - x2*y1 is sometimes nonzero when x1==x2 and y1==y2.

ChrisRackauckas · October 16, 2023, 5:55pm

This macro makes it very easy to apply @muladd at a function level. That’s syntactic, understood, and safe. In many cases, that’s going to be a much simpler option to reason about.

ParadaCarleton · October 16, 2023, 7:39pm

Sure, I’d happily make a PR if the Julia devs announced a plan to optimize FMA automatically.

mstewart · October 16, 2023, 8:33pm

And that one causes a*conj(a) to potentially have nonzero imaginary part if complex multiplication uses FMA. Another example is that b^2 - 4 a c will be nonnegative when b^2 \geq 4 a c if the obvious operations are used. However fma(-4*a, c, b^2) can still be negative, which can be inconvenient when using the quadratic formula. It’s not just problems with compensated summation or Double-Doubles.

It’s reassuring to see some resistance to the idea.

mbauman · October 16, 2023, 8:41pm

Yeah, that’s exactly it. I mean I get it, this sure feels like it should be such an obvious win — better performance and more accurate calculations, what’s not to like!? I want this to work.

But it’s not uniformly more accurate. Getting “more accuracy” sporadically in unexpected places can lead to very baffling — less accurate and less algebraic — behaviors once you go just one step beyond the trivial a*b + c.

ParadaCarleton · October 16, 2023, 11:58pm

Thank you for your contribution I think your post is the first one to bring up an actual case of these kinds of bugs causing problems in real life. I’m not sure if there might be something unique to the usual uses of KernelAbstractions.jl here, but if it caused problems there, that suggests it might not be a good choice for Julia as a whole.

Were the bugs all caused by people writing things like a*x + b*y, and not simple cases of a*x+b? If so, would it make sense to just stick to the basic/case easy win a*x + b = fma(a, x, b), but just leave a*x + b*y alone?

Oscar_Smith · October 17, 2023, 12:13am

the problem is that compilers don’t work one line at a time. You basically only have 2 options: allow re-association and contraction or prevent them.

Topic		Replies	Views
X * y + z does not automatically use FMA instruction Performance fast-math	31	2569	July 27, 2023
Inaccurate Matrix Multiplication General Usage linearalgebra	39	3332	February 28, 2022
Use different methods depending on presence of FMA Internals & Design	21	2164	July 1, 2017
Optimizing sums of products (dot products) Performance simd , linearalgebra , sum	17	626	September 24, 2024
@fastmath macro accuracy General Usage numerics , fast-math	26	11071	May 12, 2020

Why doesn’t Julia fma automatically?

Related topics