Failure to vectorize 8 Int64 multiplies when 8 Float64 multiplies vectorize

Int64 SIMD multiplies tend to be on the slower side (except for Zen4), so perhaps the cost model expects the scalar version to be faster.

AVX512 provides a Int64 multiply instruction, but on my non-Zen4 AVX512 machine, LLVM still doesn’t vectorize the integer version.
Without AVX512, Int64 multiplies are even more expensive because it needs to do Int32 multiplies and combine the results.

7 Likes