Benchmark MATLAB & Julia for Matrix Operations

Thanks. I think I knew that, but my confusion arose because on mac os or linux I always build Julia myself, whereas on Windows I don’t (!), so I obviously have a less optimized setup. I have also never successfully rebuilt the system image on windows.

@kristoffer.carlsson,

Are you sure of that?I can’t believe they have the same performance on any test unless there is something else which limit them both.

I’d assume to see some variability between them.
There is larger difference between few data runs of Julia itself then what you showed.

But if you are sure your results are valid, then:

  1. Julia is limiting OpenBLAS’s performance for some reason (Or using it inefficiently).
  2. Julia can and should match MATLAB’s performance on any tests which use BLAS (Yet it doesn’t).

Just to give another view of OpenBLAS vs.Intel MKL I will time few of the tests on Octave as well (Octave uses OpenBLAS).

Back at work now and you are right. BOTH version on 0.5 actually ran with (MKL) which is obvious in hindsight since the timings on 0.6 was different even though the same underlying library should have been used.

Updated comparison at: Imgur: The magic of the Internet

2 Likes

@kristoffer.carlsson, Now your results in line with mine.
I wrote about the anomaly of 0.5 vs. 0.6 in GitHub when you published your results in the first time.

So I think my recap is valid.

@kristoffer.carlsson, Could you download my latest files and check?

Thank You.

I just want to add a point about the argument if one should test with or without multi-threading. For me it’s very often the case that my programs are trivially parallelizable and I use only one core for each job to avoid inefficient parallelization. For this case the single core speed is what matters. On the other hand while testing and developing the program I want to use many cores to finish a single run as fast as possible. So I definitely care about both cases.

2 Likes

To add another datapoint, here are the results on a 32-core node on our cluster, with and without threading and comparing OpenBLAS and MKL:
https://github.com/barche/julia-blas-benchmarks/blob/master/BenchmarkResults.ipynb

I also reran the HPL linpack test, here are the results:

  • Standard HPL OpenBLAS, 32 MPI processes on a single node: 757 Gflops
  • Standard HPL MKL, 32 MPI processes on a single node: 788 Gflops
  • Intel HPL MKL, 32 MPI processes on a single node: 814 Gflops
  • Intel HPL MKL, 2 MPI processes with 16 threads each on a single node: 963 Gflops

From both tests it seems clear to me that MKL wins when threading enters into the equation, but single-core performance is much closer, with the possible exception of the Cholesky and Eigen decompositions.

4 Likes

Thanks! Really well done benchmarks @barche

A couple of things to note:

  1. We did some comparisons of Julia vs MATLAB a while back and gave a talk at JuliaCon, a couple of years back. You can find this talk online and while it is probably out of date, we clearly observed that MATLAB’s automatic parallelism actually slowed some codes down, as opposed to improving performance. Their heuristics for when to use threads has probably improved. But I don’t believe that automatic parallelism (when how and where threads are used is completely a mystery and not in your control) can match the efficiency of parallel-aware code. For simple stuff though, it looks pretty cool.

  2. Anyway, there is also https://github.com/IntelLabs/ParallelAccelerator.jl if you want to try automatic parallelism for Julia.

  3. If you use threads in Julia (by setting JULIA_NUM_THREADS) and call into a BLAS library, you will likely end up oversubscribing cores and completely destroying performance. This should be obvious when it happens – the slowdown is pretty massive. The answer is to also set OMP_NUM_THREADS=1, thereby preventing the libraries from starting threads. Of course, this only makes sense if you’re using your Julia threads effectively. True and effective nested parallelism is coming to Julia “soon”, but it won’t help with nesting parallelism from your Julia code and one of these BLAS libraries for a good while. It will help, if a BLAS library is written IN Julia though. :slight_smile:

HTH.

1 Like

@kpamnany, I totally agree with you.
In Julia spirit parallelism should be controlled by user.
Maybe a heuristic drive Auto Default method but certainly option to turn it OFF or ON by user control (And set its parameters).

Yet I’d add that from R2016a and on MATLAB is improving its JIT significantly with each iteration.

Please provide a link.

Search for JuliaCon 2015 on Youtube and you’ll find a talk about multithreading Julia, somewhere down the playlist.

1 Like

I updated this repository. We get more interesting results! See the plots here:

  • Julia language used is updated to V 1.1.1
  • Julia + MKL.jl benchmark, which improves the performance a lot.
  • Better and more accurate benchmarking tools both in Julia and MATLAB
  • Many improvements and updates are made.

For example

:

Edit:
I wanted to say that this project is detached and move to my organization:
We plan to make Matlab friendly APIs written in native Julia, and then test their performance by comparing it to the Matlab one.
So much more is coming and the benchmark is going to be much broader.

8 Likes

log-scale is very deceiving, so Julia is exponentially worse?

Smaller runtime is usually better.

1 Like

I, of course, know that. If you just take a look at this image,

It looks like some ‘linear time’ worse but in reality, line in log-scale is exponential time worse.

For some reason the Cholesky decomposition, Least Squares and Matrix Inversion are faster in Matlab.

2 Likes

Two lines with the same slope in log scale just means that one line is a constant factor times the other. Is that what you mean with “exponential time worse”?

7 Likes

actually, I think you’re right. I was thinking just a line in log scale by itself.

I used logscale because if I had used the linear scale, nothing would have gotten reviled, and the lines would have fallen over each other. The reason is the large variation in x (matrix size). The linear scale would be much more be deceiving, as I talked to Royi in my pull request.

If you want to see the actual run times refer to the CSV files here:
https://github.com/juliamatlab/Julia-Matlab-Benchmark/tree/master/RunTimeData

I wanted to say that this project is detached and move to my organization:
https://github.com/juliamatlab/Julia-Matlab-Benchmark
We plan to make Matlab friendly APIs written in native Julia, and then test their performance by comparing it to the Matlab one.
So much more is coming and the benchmark is going to be much broader.

7 Likes