[ANN] LoopVectorization

My measurements (EDIT: Previously I’ve used the wrong sequence of indexes by mistake. The numbers below have been corrected.):

n = 64                                                                                                                                                              
gemm!                                                                                                                                                               
  90.499 μs (0 allocations: 0 bytes)                                                                                                                                
gemmsimd!                                                                                                                                                           
  93.999 μs (0 allocations: 0 bytes)                                                                                                                                
gemmblas!                                                                                                                                                           
  30.900 μs (0 allocations: 0 bytes)                                                                                                                                
gemmavx!                                                                                                                                                            
  36.899 μs (0 allocations: 0 bytes)                                                                                                                                
n = 32                                                                                                                                                              
gemm!                                                                                                                                                               
  12.600 μs (0 allocations: 0 bytes)                                                                                                                                
gemmsimd!                                                                                                                                                           
  13.999 μs (0 allocations: 0 bytes)                                                                                                                                
gemmblas!                                                                                                                                                           
  4.966 μs (0 allocations: 0 bytes)                                                                                                                                 
gemmavx!                                                                                                                                                            
  4.914 μs (0 allocations: 0 bytes)                                                                                                                                 
n = 16                                                                                                                                                              
gemm!                                                                                                                                                               
  1.910 μs (0 allocations: 0 bytes)                                                                                                                                 
gemmsimd!                                                                                                                                                           
  2.111 μs (0 allocations: 0 bytes)                                                                                                                                 
gemmblas!                                                                                                                                                           
  1.009 μs (0 allocations: 0 bytes)                                                                                                                                 
gemmavx!                                                                                                                                                            
  838.158 ns (0 allocations: 0 bytes)                                                                                                                               
n = 8                                                                                                                                                               
gemm!                                                                                                                                                               
  543.612 ns (0 allocations: 0 bytes)                                                                                                                               
gemmsimd!                                                                                                                                                           
  612.571 ns (0 allocations: 0 bytes)                                                                                                                               
gemmblas!                                                                                                                                                           
  356.398 ns (0 allocations: 0 bytes)                                                                                                                               
gemmavx!                                                                                                                                                            
  166.395 ns (0 allocations: 0 bytes)    

Obtained with Intel(R) Core™ i7-2670QM CPU @ 2.20GHz.
According to the Intel website it has AVX.

By the way, is there a “Julian” way of querying the capabilities of processors?

3 Likes