How to achieve multi-threaded vectorized FMA operations in the for-loop for SAXPY?

Hi everyone,

For serial vectorized FMA operations in the for-loop for SAXPY, one can use

@fastmath @inbounds @simd for i in 1:n

See also this discussion (How to enable vectorized fma instruction for multiply-add vectors?) and “The Julia Language V1.9.0” (see Page 434).

My question is: how to achieve multi-threaded vectorized FMA operations in the for-loop for SAXPY? The code snippet below gives error message:

@threads @fastmath @inbounds @simd for i in 1:n

The error message is:

ERROR: LoadError: ArgumentError: @threads requires a `for` loop expression

Thank you in advance!

PS.

  • The solution using @sync and @spawn for such a problem, like SAXPY, is, IMHO, not optimal.
  • I do not want to prefix every statements in the for-loop body with @fastmath .... Possible for SAXPY, but definitely bad idea for a generic for-loop.

OK, find this Issue opened 4 years ago without any progress… Cannot combine @simd and @threads on a loop · Issue #32684 · JuliaLang/julia · GitHub

Multithreading and vectorization are both critically important topics for performance!

@threads for ...
   @fastmath begin
   ....
   end
end

Likely will do what you want. @simd doesn’t give you additional benefits over fast math here since it is primarily targeting reduction chains.

2 Likes