There were lots of concrete bogus examples in the other thread, I think this was one of the nicest ones:
And this was your “basic/case easy win a*x + b
”
There were lots of concrete bogus examples in the other thread, I think this was one of the nicest ones:
And this was your “basic/case easy win a*x + b
”
function kernel(rho, T)
P = rho[1] * T[1]
if (abs(P - P) > 1e-16)
@cuprintf("diff = %.16e\n", P - P)
end
nothing
end
rho = CuArray([1e-1])
T = CuArray([300.0])
@launch CUDA() kernel(rho, T, threads=1, blocks=1)
with the output
diff = 1.6653345369377348e-15
So abs(P-P)
being different from P-P
mul.f64 %fd4, %fd1, %fd3;
neg.f64 %fd5, %fd4;
fma.rn.f64 %fd2, %fd1, %fd3, %fd5;
abs.f64 %fd6, %fd2;
clang allows finer-grained control than this — -ffp-contract=on
will only fuse within a single statement in the source code, as allowed by the C99 standard, whereas -ffp-contract=fast
will fuse across statements.
Plugging this tangentially related issue for selectable @fastmath
flags for those wishing to exercise restraint in their fastmath. We already have @simd
with less @fastmath
slated for v1.10 so that it isn’t such a hazard.