If you could use advanced features of MKL (Like JIT and Direct Call) in Julia you’d get probably the best performance (See Squeeze More Performance from Intel MKL).
An interesting alternative would be BLASFEO.
If you could use advanced features of MKL (Like JIT and Direct Call) in Julia you’d get probably the best performance (See Squeeze More Performance from Intel MKL).
An interesting alternative would be BLASFEO.
I happen to work in the same lab as the main developers of BLASFEO.
An interesting thing about BLASFEO is that it works with a different matrix layout internally, so by exposing that directly in Julia one could get another slight performance increase.
It would also be really interesting to try and write a threaded version of BLASFEO in Julia 1.3…
https://github.com/JuliaEmbedded/BLASFEO.jl
but last changes are from 8 month ago.