My measurements (EDIT: Previously I’ve used the wrong sequence of indexes by mistake. The numbers below have been corrected.):
n = 64
gemm!
90.499 μs (0 allocations: 0 bytes)
gemmsimd!
93.999 μs (0 allocations: 0 bytes)
gemmblas!
30.900 μs (0 allocations: 0 bytes)
gemmavx!
36.899 μs (0 allocations: 0 bytes)
n = 32
gemm!
12.600 μs (0 allocations: 0 bytes)
gemmsimd!
13.999 μs (0 allocations: 0 bytes)
gemmblas!
4.966 μs (0 allocations: 0 bytes)
gemmavx!
4.914 μs (0 allocations: 0 bytes)
n = 16
gemm!
1.910 μs (0 allocations: 0 bytes)
gemmsimd!
2.111 μs (0 allocations: 0 bytes)
gemmblas!
1.009 μs (0 allocations: 0 bytes)
gemmavx!
838.158 ns (0 allocations: 0 bytes)
n = 8
gemm!
543.612 ns (0 allocations: 0 bytes)
gemmsimd!
612.571 ns (0 allocations: 0 bytes)
gemmblas!
356.398 ns (0 allocations: 0 bytes)
gemmavx!
166.395 ns (0 allocations: 0 bytes)
Obtained with Intel(R) Core™ i7-2670QM CPU @ 2.20GHz.
According to the Intel website it has AVX.
By the way, is there a “Julian” way of querying the capabilities of processors?