Why doesn’t Julia fma automatically?

mj2984 · October 13, 2023, 3:31am

Is there any reason why the compiler doesn’t optimize these? I feel like having to do this is a detriment to high level programming, and especially the math like syntax that I can write in Julia (same expression I use in my paper).

jling · October 13, 2023, 3:34am

fma changes results because IEEE floating point numbers are weird.

ParadaCarleton · October 14, 2023, 8:20pm

Yes, it changes them to be more accurate (only 1 roundoff instead of 2), so this still really doesn’t make sense.

adienes · October 14, 2023, 8:23pm

Yes, it changes them to be more accurate (only 1 roundoff instead of 2), so this still really doesn’t make sense.

It changes them to be unpredictable

more accurate or not is mostly irrelevant here

nsajko · October 14, 2023, 10:30pm

I guess this was already discussed lots of times. One relevant thread:

The TLDR is because of correctness, it is assumed that the user uses the exact FP operations that are necessary, and changing the operations in such a way that the result is changed would be incorrect. This is consistent with other optimizations that compilers (not just Julia) do, in general an optimization is allowed if it doesn’t change any observable result.

The thread also gives specific examples of FP code that can break in such cases. Search for “error-free transforms” or “multi-word arithmetics” or “floating-point expansions” for more. An example Julia package that relies on the current behavior is MultiFloats. Many implementations of transcendental math functions would also be broken.

ParadaCarleton · October 15, 2023, 1:54am

They’re perfectly predictable; whenever you see A x + b, it gives the same answer, consistently, so people familiar with FMA can predict exactly what it will do. People who aren’t familiar with FMA can also predict exactly what it will do: It will calculate A x + b, and anything beyond that doesn’t matter to them, and because if your code relies on whether A x + b rounds exactly once or exactly twice, it’s bad code.

Oscar_Smith · October 15, 2023, 1:59am

The problem is a*b+c*d there are two ways to fuse this that will give different answers.

ParadaCarleton · October 15, 2023, 1:59am

It’s consistent with some other compilers (like C or C++) but not with others (like Haskell or every Python DSL for machine learning).

ParadaCarleton · October 15, 2023, 2:11am

Yeah, you’re right that this is an unfortunate case. That being said, I think it still makes sense to just pick a standard interpretation, since (a*b + (c*d)) or vice-versa will both usually be a bit more accurate than (a*b) + (c*d). Alternatively, we could choose to interpret anything more complex than a * b + c in strict numerical order, and just stick to the easy win of optimizing a * b + c.

jar1 · October 15, 2023, 8:14am

f(a,b) = a * b
f(a,b) + c

would then depend on inlining? Are floating point results allowed to depend on inlining?

nsajko · October 15, 2023, 10:19am

Not sure what you mean. If I make a program that prints “hello world”, in whichever language, I wouldn’t expect it to print “hello Earth”. Unless that was somehow explicitly allowed by some documented semantics, of course.

Palli · October 15, 2023, 7:50pm

Or rather would depend on inlining NOT happening, or at least if inlining done, not behaving as if written as one function. And that would be reasonable to do.

ParadaCarleton · October 16, 2023, 1:54am

Because changing exact floating-point values isn’t (and can’t) be considered a breaking change, or it would imply that we could never change any numerical function. If you figure out an optimization that lets you double the speed of sum, too bad, can’t be merged, it’s a breaking change. Upgrading LLVM would be a breaking change too (because LLVM doesn’t guarantee exact floating-point results). Pretty much any kind of optimization would end up becoming illegal (e.g. broadcast fusion would have to be thrown out as well).

Whenever I write a * x + b, I’m already asking for an abstract thing, and accepting that what I get will only be approximately correct (that’s how floats work!); I can’t get hung up on the exact values of the last one or two bits.

ParadaCarleton · October 16, 2023, 2:01am

Yes, they are (and already do in a few situations). The exact value of the last bit isn’t guaranteed.

adienes · October 16, 2023, 2:03am

pretty false equivalency between changing sum and changing +

they have different contracts

ParadaCarleton · October 16, 2023, 2:25am

Neither of them guarantees exact floating-point values in their contract (and they can’t, because LLVM doesn’t guarantee exact floating-point values). But if you want a more analogous case, A * B * C chooses the execution order that has the best performance, rather than exactly left-to-right.

adienes · October 16, 2023, 2:45am

A * B * C chooses the execution order that has the best performance, rather than exactly left-to-right.

and this is indeed already a controversial design choice itself. but in any case (A*B)*C does not. where do you draw the line? if you allow rewriting a*b + c, why not rewrite more complicated arithmetic? will parentheses save you? or only function barriers? or neither?

because LLVM doesn’t guarantee exact floating-point values

I am not an LLVM expert, but I’d venture a guess that * and + mean one thing and one thing only. without fast math flags, doesn’t it follow IEEE754?

Palli · October 16, 2023, 3:02pm

It’s very unclear that LLVM does NOT fuse by default, or at least isn’t allowed to, or that IEEE disallows that. IEEE demands FMA available now; since 2008 revision? At least 2019 revision. LLVM is an implementation detail, Clang based on it does FMA by default:

https://clang.llvm.org/docs/UsersManual.html#cmdoption-ffp-contract

The C standard permits intermediate floating-point results within an expression to be computed with more precision than their type would normally allow. This permits operation fusing, and Clang takes advantage of this by default. This behavior can be controlled with the FP_CONTRACT and clang fp contract pragmas. Please refer to the pragma documentation for a description of how the pragmas interact with this option.

Valid values are:

fast (fuse across statements disregarding pragmas, default for CUDA)

on (fuse in the same statement unless dictated by pragmas, default for languages other than CUDA/HIP)

off (never fuse)

fast-honor-pragmas (fuse across statements unless dictated by pragmas, default for HIP)

It’s not clear to me IEEE demands fused used, i.e. not just available. It’s very unlikely it demands it used, but neither clear it demands it NOT used.

So it’s up to Julia, and it can and would still claim IEEE compatibility. I think there might be exceptions to Julia wanting to do this, e.g. for complex values. But otherwise I think we should do the same “on” the default (and I guess “fast” for CUDA), as above. But that also means we need some pragma equivalent to opt out. I suppose that would be a macro in Julia. I think we should do this to compete with C/C++ (in recent compilers; and Mojo also, I suppose it defaults to fused too).

mbauman · October 16, 2023, 3:23pm

That is not how floats work. They have an exact arithmetic with a limited number of “decimal” places. They just snap to the nearest representable value if you land somewhere that’s in between the values they can represent — but that value is an exact number. It’s not a “fuzzy” thing. There are many algorithms that explicitly depend upon this.

Topic		Replies	Views
X * y + z does not automatically use FMA instruction Performance fast-math	31	2555	July 27, 2023
Inaccurate Matrix Multiplication General Usage linearalgebra	39	3306	February 28, 2022
Use different methods depending on presence of FMA Internals & Design	21	2159	July 1, 2017
Optimizing sums of products (dot products) Performance simd , linearalgebra , sum	17	614	September 24, 2024
@fastmath macro accuracy General Usage numerics , fast-math	26	11034	May 12, 2020

Why doesn’t Julia fma automatically?

Related topics