One subtle issue here that is discussed on GitHub but hasn’t been bought up here is that there isn’t one obvious way to apply this optimization. If you write a*b + c*d there’s two ways to apply FMA:
fma(a, b, c*d)fma(c, d, a*b)
Which one gets done by LLVM? Well, that’s hard to predict because LLVM knows that + is commutative, so it might have changed a*b + c*d into c*d + a*b by the time the FMA transformation gets done. So by enabling automatic FMAs, you’ve taken code that’s completely unambiguous and deterministic—the naive implementation of a*b + c*d as two multiplies followed by an add—and changed it to something where what you’re computing depends on whims of this particular version of the compiler and could change if we upgrade LLVM or choose a different set or ordering of optimization passes.
Note that there’s no such problem if you explicitly write fma in your code: then it’s completely unambiguous what you want to compute and will always be the same. It also seems fine to me if you’ve used @fast_math to explicitly give the compiler permission to compute something a bit different than what you wrote if it deems it faster. In such cases, we understand that the compiler is making a judgement and that this judgement might change. But we try very hard not to do that kind of thing by default in Julia—we compute what you asked for, not something else.