Benchmark MATLAB & Julia for Matrix Operations

My argument is as following:

  1. Julia indeed should allow the user to leverage MKL easily.
    You may say it is done given MKL.jl.
  2. Julia needs to integrate MKL and other Intel packages with synergistic integration.
    This is more tricky, but can be done. For instance I’d like to see Julia taking advantage of all MKL API’s. It has JIT and Direct Call to make it faster in smaller matrices. I would like Julia smart JIT to see there is a loop multiplying the same matrix over and over with different matrices and use it Compact API. Also there are ways to utilize Batch and Packed API’s (See Squeeze More Performance from Intel MKL). I want Julia to use SVML for SIMD with Element Wise Math Operations. Namely tighter integration. Julia, with its flexibility built in into the language, can be the perfect driver to squeeze everything MKL has to offer. This is far form done. Actually I’m not sure even a single step was taken.

There is no drama that MKL BLAS + LAPACK is faster than OpenBLAS. On the contrary, it is amazing that they are even close given the huge amount or resources MKL team has compared to OpenBLAS.
Also, Julia’s weakness with its broadcasting will be solved. The way I see it there will be some decorations to tell compiler there is no aliasing, no dependence, etc… Just like those PRAGMA’s on C (ivdep, vector, aligned, etc…) and a decoration to force, disable or auto mode for SIMD and Multi Threading.

The question is if we one day something like this will be optimized:

for ii = 1:1000
    tC[:, :, ii] = tA[:, :, ii] * tB[:, :, ii];
end

Will be optimized into a Batch Call of MKL so we’ll have Big Matrices performance on small to medium matrices.

Will we see one day something like this:

mA = qr(mB);
mC = qr(mD);
mE = qr(mF);
mG = qr(mH);

Transformed into MKL Compact API for small matrices.

I also wish to see:

for ii = 1:100 
   tC[:, :, ii] = mA * tB[:, :, ii];
end

Calling MKL Compact API.

The above has the potential to bring large matrices performance in many use cases where the user use small to medium matrices. It is a Joker Julia can have in its hand.

Not to mention having the Sparse Solvers of MKL, utilize some of its VML Library etc…

Using SVML will allow Julia match Numba JIT optimization in Python for arrays with Element wise Mathematical expressions.

So, for me, Julia’s elegant machinery can create the perfect driver around those librtaries.
I don’t think everything should be written in Julia. I think Julia can also be the intelligent glue to utilize to the maximum low level libraries in a fashion no other language can without being explicit.

Remark
I have no idea where did you bring “No optimized allocator in Julia”. Do you have any issue to show that in GitHub? It seems Julia allocator is as efficient as any other. It is more flexible as you can do “Low Level” operations which usually are not exposed.