Benchmark MATLAB & Julia for Matrix Operations

To add another datapoint, here are the results on a 32-core node on our cluster, with and without threading and comparing OpenBLAS and MKL:
https://github.com/barche/julia-blas-benchmarks/blob/master/BenchmarkResults.ipynb

I also reran the HPL linpack test, here are the results:

  • Standard HPL OpenBLAS, 32 MPI processes on a single node: 757 Gflops
  • Standard HPL MKL, 32 MPI processes on a single node: 788 Gflops
  • Intel HPL MKL, 32 MPI processes on a single node: 814 Gflops
  • Intel HPL MKL, 2 MPI processes with 16 threads each on a single node: 963 Gflops

From both tests it seems clear to me that MKL wins when threading enters into the equation, but single-core performance is much closer, with the possible exception of the Cholesky and Eigen decompositions.

4 Likes