Where did this idea come from that C and C++ do FMA automatically?
From Clang, I assume that means C and C++.
See my comment above, and where I’ve answered you on that before (I confirmed personally for Clang for RISC-V with Godbolt):
The C standard permits intermediate floating-point results within an expression to be computed with more precision than their type would normally allow. This permits operation fusing, and Clang takes advantage of this by default.
I believe Clang has an exeptions to its new default for x86… at least for now.
Julia doesn’t have such, except well packages (precompiled by now), why I proposed a possible exception for them/libraries.