When shall I use `BLAS.axpy!` and when `muladd`?


#1

Can’t tell much difference between the two. Any insights? Thanks.


#2

I don’t understand the comparison. axpy! is for vectors, while muladd is for scalars, although of course you can do x .= muladd.(x,y,z) to apply it to vectors similar to axpy!.

I would say that 99.9% of code should not be calling low-level BLAS functions directly. If a BLAS-1 function like axpy! is performance-critical for you, you probably need to re-think your code anyway.


#3

Oh, sorry. Just realized that muladd is for scalars. What is the high-level surrogate for ‘BLAS.axpy!’ then? Can you give me a pointer?

Why is calling low-level BLAS functions deemed a bad idea? Sorry if my question sounds stupid… Really appreciate!


#4

Low-level BLAS calls usually are memory-bound and not compute-bound, so you’ll find that using low-level BLAS usually doesn’t even give a performance advantage over Julia (that’s not true of high-level BLAS though). muladd is generic, can fuse, and will be FMA on processors which it should, so it’s a great option here.


#5

Thanks for the explanation! Get it now.


#6

I thought BLAS would give threading for free as compared to broadcasting muladd. Am I wrong?


#7

It does. It might or might not be useful, depending on the architecture/BLAS/weather : Using axpy!. It also might have an overhead for small sizes.