Benchmark MATLAB & Julia for Matrix Operations

@ChrisRackauckas,

I’m not so sure how to handle PR.
Anyhow, I updated the test with some more test as @felix asked and updated others.

For some reason the results are different now (The previous test was run 5 times by with the same results, so I’m not sure).
I uploaded them so people will be able to see and remark, but I have to check them again.

I will refactor the code to be more modular.

@felix, It seems Julia’s sqrtm() and expm() are faster than MATLAB.
Unless something is wrong with the new test.

Update
I made the code modular and I think I’m onto something.
Once the benchmark file contained too much functions (The step from the first iteration with 12 functions to 15) MATLAB’s performance degraded greatly.
Now I factorized the code into 3 separate Sub Tests and MATLAB it on top again.
Really strange and bad behavior by MATLAB.

Here’s the codes from the PR:

function MatrixAdditionRunTime( matrixSize )

  mX = randn(matrixSize, matrixSize)
  mY = randn(matrixSize, matrixSize)
  sacalrA = rand()
  sacalrB = rand()

  mA = similar(mX)
  runTime = @elapsed for i in eachindex(mA)
    mA[i] = (sacalrA * mX[i]) + (sacalrB * mY[i])
  end

  mA, runTime
end

function MatrixMultiplicationRunTime( matrixSize )

  mX = randn(matrixSize, matrixSize)
  mY = randn(matrixSize, matrixSize)
  sacalrA = rand()
  sacalrB = rand()
  
  runTime = @elapsed begin
    for i in eachindex(mX)
      mX[i] += sacalrA
      mY[i] += sacalrB
    end
    mA = mX * mY
  end

  return mA, runTime
end


function ElementWiseOperationsRunTime( matrixSize )

  mX = randn(matrixSize, matrixSize);
  mY = randn(matrixSize, matrixSize);

  mA = similar(mX); mB = similar(mX)
  runTime = @elapsed for i in eachindex(mX)
    mA[i] = sqrt(abs(mX[i])) + sin(mY[i])
    mB[i] = exp(-(mA[i]^ 2))
    mA[i] += mB[i]
  end

  mA, runTime
end

Those are much more Julian-style codes and avoid the temporaries. One thing to note is the rand() instead of rand(1): there’s no reason to make your scalars into arrays! The same applies to your most recent benchmarks:

vB = randn(matrixSize);

makes a vector. No need for the extra 1. It won’t make much of a difference, but it’s peculiar.

Also, you should add

maxNumCompThreads(1)

for the MATLAB benchmarks to disable the multithreading.

Most likely it was due to hyperthreading, i.e. 2 threads per core in an Intel chip, 8 on a POWER 8 core.

1 Like

But MATLAB said it was only using 4 threads.

Is the MATLAB code mostly calling into BLAS? Could BLAS be using all 8 threads, even though MATLAB only reports using 4?

The element-wise operations had an ~8x change. No BLAS there.

Hmmm, then I’m stumped, unless MATLAB is really reporting the number of hardware processors (4), instead of the virtual processors (8), but still using all 8.

That’s my guess as well, but if that’s true, that’s really weird behavior.

Not so atypical - I’ve seen that a lot, where old code doesn’t take into account hyperthreading (and MATLAB is some pretty ancient code!)
You could test that hypothesis by turning off hyperthreading on your machine, and seeing if you only get ~4x speedup, and MATLAB still reports 4 threads.

I found this (if you’re on a Mac like me):

Indeed there is a solution, but you have to have Xcode installed.

Search for “Instruments” with Spotlight (cmd+space), open the application and go to the preferences (cmd+,). Select the “CPUs” tab and untick “Hardware Multi-Threading”.

However, after restarting your Mac it will be enabled again.

@ChrisRackauckas it’s great to see this whole point described so clearly – I couldn’t have put it into words like that. Of course I agree, and from my point of view, and in my applications, Julia absolutely has the edge as a language. Like you say, Matlab has an advantage in the packages, and that feeds into the first-time user experience for someone who hasn’t looked in the docs yet. We’ll catch up with the packages, and we’ll keep improving the discovery process/resources for first-time users to get up to speed, and that’ll get people to persevere and discover all the depth and power behind it.

@RoyiAvital that’s wonderful! I’l be curious to see these benchmarks on Julia 0.6 as well, see what has improved. Out of curiosity, do you have access to different Matlab versions? How big is the speedup of R2016a??

What about combining the two, at least until the Julia packages have caught up with what’s available in MATLAB?
I’m not sure about the current state, but I know that @musm had done a lot of work in fixing up https://github.com/JuliaInterop/MATLAB.jl (I hope all of his PRs got merged! :-))

1 Like

No. The current matlab kernel (no better word for that) is quite recent. A lot of things changed along the 2015 releases and 2016. I have currently 2017a in testing and my first glance is: They are still improving.

2 Likes

@lobingera,

Yea, I wrote about it before.

They greatly improve their JIT Engine and I think they improve fast.
I’d say one of the reasons is Julia itself.
I wonder how they compare in regular stuff these days.

Moreover most of their stuff is written in MATLAB code as well (Toolboxes and many other basic functions, see expm() and sqrtm()).
It’s not like all is in C / Fortran.

On top of that, as I user, I don’t really care.
90% of the things I code are derived purely from Math (Mostly Linear Algebra).
I care how fast it runs. I won’t give Julia extra points because it is written in Julia itself.
Also if Juila is 100x times faster in recursion yet slower in array work I’d still take MATLAB because I don’t do General Programming, I mostly do Scientific Programming.

The nice thing about MATLAB, for me as an engineer, it it behavior is easier to predict (For instant, I’m not sure I like their implicit broadcasting added in R2016b, I’d like it to come with a decorator) and very similar to Linear Algebra.
It might be a thing of an habit, but I did my first steps with Julia and it gave me some hard time on some things.

For instance, in MATLAB [ content ] is preserved for homogeneous types.
In Julia it acts like MATLAB’s cells. I wish Julia would take MATLAB’s approach on that and use { content } for in homogeneous stuff.

Moreover, why the differentiation between 1x1 array to a scalar?
I read a single number from CSV and had to do 4 operations to make it viable in the zero constructor.
Again, maybe just struggles with old habits, but it feels the learning curve in Julia will be higher than I had in MATLAB (Which felt clean like writing Linear Algebra on paper)…

The way I see it, to bring the industry of Signal / Image / Data processing Julia must be on par on the basic stuff.
I don’t see in Addition / Multiplication and the other stuff operating on Linear Algebra objects as “Toolbox”.
In our days this is the basic stuff and Julia must be on par regarding that.

Then it will have the argument, “Julia does all the regular stuff as fast as [Complete Your Choice] and in addition we’re better on all the rest”. And hoping the rest is good enough to make people think it worth learning something new.

Anyhow, I will add 2-3 (K-Means, OMP and maybe Reweighted Least Squares) real world algorithms to measure performance. They include loops and other logic that should make “Julia” though as most algorithms in those fields the main bottleneck here is working with Matrices.

All of that said… I’m still heavily surprised there’s that much of a difference in the linear algebra, and think that most of it will go away when the problems noted above are fixed (mostly: do not time in the global scope, time twice (to get rid of compilation), and maybe switch to MKL). I wouldn’t expect Julia to be faster, just very close, since again, they are likely just calling the same C/FORTRAN BLAS/LAPACK library in this case.

Each algorithm is executed 5 times and the median time is taken (It was like that from the first test).
Moreover, all is done within a function (Namely you do include to a script which includes files where the functions are, I still have hard time with the scope of Julia, is that OK) so besides Julia + MKL all is done as you wanted.

For Julia, you still need to always drop the first execution (which includes the compilation time).

Shouldn’t be a problem with medians.

@ScottPJones, That holds for MATLAB as well.
Hence I used the Median and not the Mean.
It’s not sensitive to outliers.

If you want to be even more robust you can increase the parameter numIterations (Automatically loaded from the CSV file to keep Julia and MATLAB aligned on that).

Median is definitely better than mean, but it is quite common to use the minimum time when benchmarking, since that is probably the most representative value. The code cannot really ever run faster than its max speed, but can be slowed down by things happening with the OS, or other stochastic events which are not really related to the language.

You could say that if the measurement noise is always positive, the most accurate estimate is the minimum sample value.

2 Likes

@DNF, I can see the logic in your point.
Though I could see the logic in timing real world run time when there is OS behind.

But no problem, I will add it to the ToDo List.

Can someone guide me through PR on GitHub?
I’d like to add option for “Optimized Julia” version of the files.

So we’ll have Vanilla MATLAB, Vanilla Julia and Optimized Julia.

Thank You.

Update
One second thought, @DNF I don’t think this is a good idea.
We’re not measuring the run time of algorithm, we tray to measure the relative speed.
Hence I don’t want to bias the results towards a run which got a “Clean” run time from the OS.
It might be a matter of luck and might happen on cases when MATLAB is running or Julia.
I think in order to compare between then and not the absolute run time the Median will have less variance and bias.

But anyhow, the best way to get proper results is run more and more iterations.

On the other hand, the standard error of the median is higher than the mean for a given sample (by a factor of ~1.253).

Sampling errors of quantile estimations from finite samples of data

1 Like