Why doesn’t Julia fma automatically?

There were lots of concrete bogus examples in the other thread, I think this was one of the nicest ones:

And this was your “basic/case easy win a*x + b

7 Likes
function kernel(rho, T)
  P = rho[1] * T[1]

  if (abs(P - P) > 1e-16)
    @cuprintf("diff = %.16e\n", P - P)
  end
  nothing
end

rho = CuArray([1e-1])
T = CuArray([300.0])
@launch CUDA() kernel(rho, T, threads=1, blocks=1)

with the output

diff = 1.6653345369377348e-15

So abs(P-P) being different from P-P

	mul.f64 	%fd4, %fd1, %fd3;
	neg.f64 	%fd5, %fd4;
	fma.rn.f64 	%fd2, %fd1, %fd3, %fd5;
	abs.f64 	%fd6, %fd2;
6 Likes

clang allows finer-grained control than this — -ffp-contract=on will only fuse within a single statement in the source code, as allowed by the C99 standard, whereas -ffp-contract=fast will fuse across statements.

3 Likes

Plugging this tangentially related issue for selectable @fastmath flags for those wishing to exercise restraint in their fastmath. We already have @simd with less @fastmath slated for v1.10 so that it isn’t such a hazard.

4 Likes