Can’t tell much difference between the two. Any insights? Thanks.
I don’t understand the comparison. axpy!
is for vectors, while muladd
is for scalars, although of course you can do x .= muladd.(x,y,z)
to apply it to vectors similar to axpy!
.
I would say that 99.9% of code should not be calling low-level BLAS functions directly. If a BLAS-1 function like axpy!
is performance-critical for you, you probably need to re-think your code anyway.
Oh, sorry. Just realized that muladd
is for scalars. What is the high-level surrogate for ‘BLAS.axpy!’ then? Can you give me a pointer?
Why is calling low-level BLAS functions deemed a bad idea? Sorry if my question sounds stupid… Really appreciate!
Low-level BLAS calls usually are memory-bound and not compute-bound, so you’ll find that using low-level BLAS usually doesn’t even give a performance advantage over Julia (that’s not true of high-level BLAS though). muladd
is generic, can fuse, and will be FMA on processors which it should, so it’s a great option here.
Thanks for the explanation! Get it now.
I thought BLAS would give threading for free as compared to broadcasting muladd
. Am I wrong?
It does. It might or might not be useful, depending on the architecture/BLAS/weather : Using axpy!. It also might have an overhead for small sizes.