How many physical cores does your CPU have?
Often, that will be half of Sys.CPU_THREADS, but sometimes the two will be equal.
Try BLAS.set_num_threads(NUMBER_PHYSICAL_CORES). When the numbers are not equal, this will be faster than using Sys.CPU_THREADS threads.
Almost 100% of the time is being spent in BLAS/LAPACK, so if both are using MKL, I donāt know why their ought to be a difference. Unless theyāre defaulting to different BLAS calls under the hood.
Also, Iād just like to confirm that you restarted your Julia session after building MKL.jl?
FYI, benchmarking the inversion of a large matrix only depends on the BLAS library that is linked; it basically has nothing to do with the language. See also OpenBLAS: Julia slower than R - #2 by stevengj
Gratis, but not free. In RMSā words, free as in free beer, not free as in freedom.
Apart from politics, it is apparently nontrivial to sort out licensing in a way that permits to distribute compiled julia binaries with both MKL and GMP. Microsoft seems to believe that it is legally possible under some circumstances (cf existence of MRAN), and they presumably have competent lawyers who figured this out.
It looks like they are re-licensing MKL as āMicrosoft R Services MKLā. I assume this involves a deal between MS and Intel, as legally you enter into a license agreement with MS if using this product (even the MKL part).
I understand the considerations for speed, but I think that using a FOSS library by default for Julia is the right choice. I would be uncomfortable with using a black box (a very nice, well-tested black box, but a black box nevertheless) for research. Especially since if someone really wants to do it, installing & using MKL is always an option.
IMO the discrepancy in naive benchmarks between various languages is an orthogonal issue and should be addressed by user education.
Afaik the main problem is the possibility of running afoul of the GPL when distributing binaries that form a derived work (MRAN, Julia) of GPL code (R, GMP) and unfree code (MKL). I donāt exactly see how a deal between MS and intel helps with that.
IANAL, but it is definitely plausible that this is OK, in the same way that some linux distros dare to ship binaries for zfs; and it is at least not totally implausible that this is problematic, and some distros donāt ship zfs binaries (i.e. require zfs users to compile at home, as currently necessary when using MKL with julia). Iāll assume that whatever MS is doing is legally sound, but canāt guess at what special circumstances are relevant to replicating that feat.