[ANN] LoopVectorization

Coincidentally, this is exactly what I just did as an example of how to work with static ranges in StaticNumbers.jl.

It works quite well. I’ve benchmarked matrix-matrix multiplication for all sizes up to 8x8, and for some of the sizes I get 2-4 times speedup compared to SMatrix.

5 Likes