I’m running some processor heavy stochastic simulations that are increasingly limited by BLAS matrix dot multiplications on 9*9 matrices, as I gradually add optimisations for everything else.

It seems that MKL (and maybe other BLAS libraries) can be significantly faster on intel CPUs than the OpenBlas that ships with julia, especially with smaller matrices.

But I have no experience using them with Julia. So my questions are:

Is MKL actually faster for this kind of problem?

How easy is it to swap Blas implementation?

Is there a simple guide somewhere?

Is using MKL.jl the recommended approach?

What are the downsides?