This seems to be basically a thin wrapper around threading. It should be supported by good support for easy threading in Julia rather than at the level of MKL.
@antoine-levitt,
Indeed if the whole magic is via Multi Threading it should be don in Julia level.
Though it might be less overhead to call one C function instead of multi calls, no?
Unless small matrices will be treated by JuliaBLAS and then engine which will Multi Threaded those operations as above will be the best choice.
I don’t think the overhead of calling C is significant. And if you’re doing very small matrices, you’re probably better off with StaticArrays anyway, which bypasses BLAS.