Optimizing sums of products (dot products)

mbauman · September 23, 2024, 5:50pm

I’m always on the hunt for good “mere mortal” code that goes awry.

One of the fundamentals is that fast math allows the compiler to sometimes give you different answers for basic maths depending on the context and surrounding code and available optimizations. That makes debugging very challenging, and especially so in Julia where small functions often inline and constant propagate and then subsequently optimize in different ways.

So a huge practical consequence of this — yes even in “mere mortal” higher-level less-numericsy code — is that you can’t assume that the maths you did in an if conditional will match the results of the same maths elsewhere. Just because f() < g() in one place in your code doesn’t necessarily mean that f() != g() with ffast/funsafe-math. In fact, f() might be bigger than g() elsewhere. So even if you try to “defend” against fast-math with branches, your adversary is already there in your defense, mucking around in your conditionals. That most infamously happens with isnan and isinf checks (and @mikmoore has some great examples above), but it also applies to finite math, too — subtractions can get re-arranged to be additions, things can swap sides of the equality, etc., etc.

My favorite examples are in the contract category because it feels like converting a mul-add to the higher-precision fused multiply and add should be a no-brainer — better accuracy and better performance, what’s not to love?

The expression a*b + c*d now has three possible answers, and the one that’s most accurate is value-dependent. This is particularly bad for complex multiplication, because x*conj(x) definitely needs to be real — and it is, but only if you don’t contract. It’s actually quite relevant to the original conversation here, because that’s precisely what’s on the inside of the dot-product x'x. With FMAs, the imaginary parts would be sometimes positive, sometimes negative. Whatever, Matt, you might say, I’m not implementing complex arithmetic…
Taking the square root of a negative float is an error, so you might have a check to ensure that your argument is positive in that branch. What if you’re solving the quadratic equation? The operand to sqrt there is b*b - 4*a*c — look familiar? Just like a*b + c*d, there are multiple possible ways to compute this with FMAs in the mix. I’d love to find time to build a realistic example that demonstrates different values in an if than the sqrt, but it’s the same idea as this example. What would make it even more fun is that — upon getting a spurious domain error — you might add logging and then never see it again, because just looking at the intermediate values would break the fma.

Topic		Replies	Views
Calculating "triple" dot products Performance	14	328	January 22, 2025
Memory allocation in dot product Performance question , memory-allocation	11	1215	September 9, 2021
Avoiding intermediate vector allocations in matrix to scalar computations Performance memory-allocation , matrix	6	472	September 20, 2023
For loop optimization Performance performance , memory-allocation , loops	3	1663	October 31, 2021
Avoiding memory allocation for vector operations Performance	11	3251	July 25, 2018

Optimizing sums of products (dot products)

Related topics