Why doesn’t Julia fma automatically?

giordano · October 17, 2023, 12:41am

There were lots of concrete bogus examples in the other thread, I think this was one of the nicest ones:

And this was your “basic/case easy win a*x + b”

vchuravy · October 17, 2023, 2:33am

function kernel(rho, T)
  P = rho[1] * T[1]

  if (abs(P - P) > 1e-16)
    @cuprintf("diff = %.16e\n", P - P)
  end
  nothing
end

rho = CuArray([1e-1])
T = CuArray([300.0])
@launch CUDA() kernel(rho, T, threads=1, blocks=1)

with the output

diff = 1.6653345369377348e-15

So abs(P-P) being different from P-P

	mul.f64 	%fd4, %fd1, %fd3;
	neg.f64 	%fd5, %fd4;
	fma.rn.f64 	%fd2, %fd1, %fd3, %fd5;
	abs.f64 	%fd6, %fd2;

stevengj · October 18, 2023, 12:27pm

clang allows finer-grained control than this — -ffp-contract=on will only fuse within a single statement in the source code, as allowed by the C99 standard, whereas -ffp-contract=fast will fuse across statements.

mikmoore · October 18, 2023, 4:17pm

Plugging this tangentially related issue for selectable @fastmath flags for those wishing to exercise restraint in their fastmath. We already have @simd with less @fastmath slated for v1.10 so that it isn’t such a hazard.

Topic		Replies	Views
X * y + z does not automatically use FMA instruction Performance fast-math	31	2544	July 27, 2023
Floating point optimizations General Usage question , float	5	474	November 1, 2022
`@fastmath` is not applied to macros Performance fast-math	5	407	July 20, 2023
Fast math in NASA benchmark General Usage	8	1516	July 23, 2020
Use different methods depending on presence of FMA Internals & Design	21	2149	July 1, 2017

Why doesn’t Julia fma automatically?

Related topics