Benchmark MATLAB & Julia for Matrix Operations

ChrisRackauckas · February 11, 2017, 2:53pm

Just copy in the code I already gave you.

function MatrixAdditionRunTime( matrixSize, mX, mY )

  sacalrA = rand();
  sacalrB = rand();

  tic();
  mA = (sacalrA .* mX) .+ (sacalrB .* mY);
  runTime = toq();

  return mA, runTime;
end

function MatrixMultiplicationRunTime( matrixSize, mX, mY )

  sacalrA = rand();
  sacalrB = rand();

  tic();
  mA = (sacalrA .+ mX) * (sacalrB .+ mY);
  runTime = toq();

  return mA, runTime;
end

These still have temporaries.

pkofod · February 11, 2017, 4:33pm

Sure, but op is just trying to say: I have this computer and I want to do this linear algebra: which is faster. Sure Julia could be made faster but so could matlab. I think matlab is a horrrible language to develop in though, so I stay in Julia camp, but if all I needed was to make a big matrix addition here and now, it looks like I should do it in matlab

ChrisRackauckas · February 11, 2017, 4:38pm

Why say that when you can make a more refined statement: “it looks like I should do it in MATLAB, only because implicit parallelism makes it faster”. It’s concise, true from the benchmarks I ran (turning multithreading off), and gives a solid development path / answer for Julia to work towards. The benchmarks just need one line to prove that point, and then it’s something which is useful. “I see a difference” isn’t useful.

lobingera · February 11, 2017, 4:46pm

Can you provide a good reasoning for the ‘only’? That’s the main point here. Matlab uses the underlying HW efficiently out-of-the-box, julia doesn’t.

ChrisRackauckas · February 11, 2017, 4:53pm

Take the opimized non-tempoary-allocating codes I wrote above for MatrixAdditionRunTime, MatrixMultiplicationRunTime, and ElementWiseOperationsRunTime. Benchmark those in Julia. Then take the MATLAB codes (optimized, since vectorized) and run them with:

maxNumCompThreads(1)

I saw a massive swing in the benchmarks (as already indicated) showing Julia far ahead, indicating that the confounding factor (at least in benchmarks related to these operations) is implicit parallelism.

pkofod · February 11, 2017, 5:17pm

but why would you intentionally turn of a feature if you needed to get something done?

ChrisRackauckas · February 11, 2017, 5:39pm

Why would you write a paper with no result?

Is the phrase “make things better!” helpful?

The benchmark is supposed to show something. “Julia is slower in these benchmarks” isn’t an interesting or informative result. “Julia is slower in these benchmarks because MATLAB utilizes implicit parallelism” is a very informative result and tells us that we won’t match MATLAB in these benchmarks until we use multithreading.

This doesn’t mean only run it without multithreading, I am saying show both lines, MATLAB with implicit parallelism, and MATLAB without implicit parallelism. In at least some of the benchmarks, that will isolate what’s up and narrow down the benchmarks which are still unexplained (I think those should be tested to see the difference of MKL vs OpenBLAS)

RoyiAvital · February 11, 2017, 10:02pm

@ChrisRackauckas, I’m a user.
I want to test what I will get as a user, not what the theoretical capabilities of X or Y.

We all agree that theoretically they can be as fast as can be a perfect C code.
But we try to measure what they bring to the table.
Hence I don’t see the logic in limiting MATLAB to show Julia is better.

MATLAB JIT maybe was designed and optimizing with Multi Threading in mind.

By the way, are you sure your code is faster?

Pay attention, when there is code like:

mA = sqrt.(abs.(mX)) .+ sin.(mY);		
mB = exp.(-(mA .^ 2));

Can not be replaced by one loop over both arrays.
No one promised those will always be one next to each other on code.

In the case of:

mB = exp.(-(mA .^ 2));

Will it be more optimized to run over the array for squaring and exponent?
Is there a Macro to make the loop with SIMD and Multi Threading?

ChrisRackauckas · February 11, 2017, 10:09pm

It really doesn’t matter. Split the loops if you want: you’ll see it doesn’t make much difference.

Base.@threads does the multithreading, @simd does SIMD. SIMD is usually automatic by the compiler. I’ve had issues with threading before due to the Int-boxing issue, but I haven’t tried it in this case yet. It’s still experimental, so YMMV.

You ran your test. It still tells nothing and isn’t actionable. Disect the test and make it say something. I gave you a big glaring hint of how to show what the big difference is in two (three?) or more tests. The purpose of benchmarks is to learn what causes differences. You’re a user, don’t you want to know why the difference exists?

RoyiAvital · February 11, 2017, 10:32pm

I do want to learn how to use Julia better.

Hence I’m adding now the Looping in the Julia Optimized test.
I also used nice feature of Julia (Doesn’t exist in MATLAB) to define array without initializing it.

I ended up with:

tic();
  # mD = abs.(mA) .+ sin.(mA);
  mD = Array(Float64, matrixSize, matrixSize);
  @simd for ii = 1:(matrixSize * matrixSize)
    @inbounds mD[ii] = abs(mA[ii]) + sin(mA[ii]);
  end

  # mE = exp.(-(mA .^ 2));
  mE = Array(Float64, matrixSize, matrixSize);
  @simd for ii = 1:(matrixSize * matrixSize)
    @inbounds mE[ii] = exp(- (mA[ii] * mA[ii]));
  end

  # mF = (-mB .+ sqrt.((mB .^ 2) .- (4 .* mA .* mC))) ./ (2 .* mA);
  mF = Array(Float64, matrixSize, matrixSize);
  @simd for ii = 1:(matrixSize * matrixSize)
    @inbounds mF[ii] = (-mB[ii] + sqrt( (mB[ii] * mB[ii]) - (4 * mA[ii] * mC[ii]) )) ./ (2 * mA[ii]);
  end
  runTime = toq();

This only one case.
It seems @simd has no effect (As you mentioned, it is probably done by the Compiler to begin with).
The `` Macro improved only little.

Now, where should I put the “Multi Threading” Macro?

Please show.

Thank You.

Update

I updated the results of Optimized Julia by using Devectorization of some operations.
It’s really great feature and it improves some operations.

What I really like is the consistency of Julia with its results.
MATLAB is far more inconsistent with its Run Time.

My conclusions so far:

Julia is doing its infant baby steps and those are impressive steps. There are some rough edges with the language (Less predictable than MATLAB regarding the work with Numerical Arrays in my opinion) yet the potential is clearly seen.
Julia Pro (The product, not the language) must improve its BLAS / LAPACK engine. Either by working with OpenBLAS to improve things or find a solution to Intel MKL (How come Anaconda ships with MKL? They do it free, can Julia Pro + MKL be free as well? I have zero knowledge about Open Software License System, just wonder what’s the difference with Python that Anaconda pulled it off).
Julia must optimize its strength of the use of the . (Dot) operator. It is amazing feature. When implemented in its full prowess it will be a joy for the user.
Optimized broadcasting (Multi Threading, Is SIMD already there?).
Julia needs Multi Threading (Efficient one, with options set by the user).
User Control - I like the idea the user can take control over things (Using Macro for instance) instead of Heuristic. Hopefully it will be kept with Multi Threading as well (Able to set it On / OFF / Auto per job). I like the idea one can touch the bare bone (Even take off safety).
Consistency - Julia’s performance are more consistent than MATLAB. Like it!

tkelman · February 12, 2017, 12:00am

You keep saying this, but the numbers don’t back it up. Imgur: The magic of the Internet shows very minor differences between MKL and OpenBLAS. You could point Matlab to use OpenBLAS as well (there used to be an environment variable you could set to do this, I don’t know whether it still works or exactly how it was spelled) and would most likely see consistent results there too.

Anaconda doesn’t come with GPL-licensed libraries for fast Fourier transforms or sparse linear algebra. If you want to do either of those in Python you don’t have access to the libraries that are considered best-in-class. In Matlab you’re paying for the licensing cost of combining a commercial license for FFT or sparse linear algebra code (that would otherwise be GPL) with the closed-source MKL library. Distributing such a combination with the GPL versions of those libraries is not legal.

The option is available for paid commercial licensing through Julia Computing, but such commercial licensing agreements likely prohibit making such combinations available at zero cost. You can also build Julia with reduced functionality and delete the FFT and sparse linear algebra capabilities. Depending on how you obtain MKL, it might be legal to distribute a no-GPL build of Julia along with MKL. The open-source Julia project does not currently have an MKL license that allows us to do that as far as I’m aware, but Intel or any other organization is welcome to reach out and make that happen.

ChrisRackauckas · February 12, 2017, 2:46am

Thanks for the numbers here. That puts the whole MKL thing to rest. I am quite surprised the difference is so negligible in most cases. Is there a reason why such a big deal is made out of it?

One interesting thing to test out in the same direction is whether using VML would affect any of the benchmarks in a big way:

@RoyiAvital, you do not have the numbers to show any of this. See @tkelman’s way of lining up the different results under different conditions. That’s the only way to show these kinds of conclusions.

RoyiAvital · February 12, 2017, 5:16am

@tkelman, I addressed your numbers in the GitHub Thread:

https://github.com/JuliaLang/julia/issues/18374#issuecomment-278686080

Something is wrong there (The numbers with MKL and OpenBLAS are too close, Look at Eigen Decomposition, they are the same).
For instance, it seems the people of OpenBlas knows their SVD is much slower than Intel MKL’s:

https://github.com/xianyi/OpenBLAS/issues/1077

https://github.com/xianyi/OpenBLAS/issues/1090#issuecomment-278724114

Yet your test doesn’t show that in your numbers.
I really think you either ran there both using OpenBLAS or both using MKL.

If those numbers are right, it might be something about Julia interact with the BLAS / LAPACK Library (How come MATLAB extracts better performance form pure BLAS tests?).

tkelman · February 12, 2017, 7:09am

@KristofferC was the one who ran those, I’m just reposting the link here.

I blame the marketing value of the library being written by Intel (but I’ll note it’s developed by an entirely different team than the processors themselves), but we repeatedly hear “I want MKL” without numbers showing if it’s actually beneficial. I’ve explained a number of ways you can get Julia with MKL if you want it so badly. If someone has a license that allows redistribution and use of their version of MKL with Julia, it would only be legal to distribute Julia including it if all GPL components were removed.

RoyiAvital · February 12, 2017, 7:22am

@tkelman, According to the results above, there is a strong case for wanting Intel MKL (I addressed the numbers by @KristofferC above).

Again,
I understand the legal issue and again I don’t try to object the case.
I only want to show how Julia, with all choices made (For any reason, legal or other), compares to MATLAB’s performance on this type of array work (Linear Algebra oriented).

That’s all into it.

I do think Julia is great.
Having those results in 0.5 release is impressive and there is a reason to be proud.
Also, OpenBLAS is young project as well. It will catch up and then things will be better.

Now, If I understand correctly your explanation for Intel MKL, because of inclusion of FFTW and the Sparse Library one can not ship Julia with MKL, am I right?
May I ask, if Julia dropped those 2, will it be able to ship with MKL?

If it does, give Intel MKL’s FFT is pretty good and also have nice Sparse Library support, would you change and take MKL instead of FFTW?

What I think should happen is to trust that OpenBLAS guys will improve with time, Julia will fix some of its weaknesses and we’ll come back here on version 1.0.

It will be closer and then the MathWorks guys will need to raise their game as well.

tkelman · February 12, 2017, 7:26am

I still haven’t seen a recent apples-to-apples comparison. Compare the same operation in the same language where the only thing you change is the BLAS implementation, otherwise there are too many confounding factors to draw a reliable conclusion.

MKL has a sparse matrix library but it’s not a drop in replacement for SuiteSparse, the API is entirely different.

RoyiAvital · February 12, 2017, 7:28am

Forget the Intel MKL and OpenBLAS names.

We compare Julia + Julia’s BLAS / LAPACK vs. MATLAB + MATLAB’s BLAS / LAPACK.

Now have your conclusions the way you want.

tkelman · February 12, 2017, 7:39am

I also don’t think it’s appropriate for an open source project to depend so highly on a closed source component that restricts who can build a working version of the code and how they can use it. Julia the open source project needs to support non-Intel processors and architectures (AMD, ARM, Power), for which MKL isn’t available.

In open source, if you want something to happen and don’t want to pay for it, you’ll need to participate and contribute towards making it happen (or be quietly patient, or technically convincing enough that someone else does it for you - this last option is very rare however). There is an open issue for moving FFTW out of the default standard library distribution of Julia and into a package instead. SuiteSparse should similarly move to a package. Contributions welcome to help those get done faster.

RoyiAvital · February 12, 2017, 7:55am

@tkelman, I agree.

Both Julia and OpenBLAS are talented young projects.
Hopefully they will get better and more mature.

It’s only the beginning.
I think the target is marked, now need to get it.

jtravs · February 12, 2017, 3:09pm

Hi all,

Just as a point of reference, in case other people reading this thread get the wrong idea: I have transitioned a large code base from MATLAB to Julia and have observed enormous speed ups using Julia, along with a much better code organization and maintainability. Granted I do not use any linear algebra in my code (and I get slightly annoyed when people equate scientific computing = linear algebra, although I do acknowledge it is common), my code is mostly FFT and ODE solving on huge systems of equations. Indeed I have also transitioned from some heavy C++ codes (which I wrote originally to get passed the poor performance of MATLAB) and I get similar performance in Julia compared to C++ when I optimized somewhat.

So the language itself I know is far better performance than MATLAB. Perhaps some of the applied algorithms are less optimized at the moment, but please do not get the wrong impression from this thread, which is focused on just one aspect of scientific computing.

In addition Julia the language is just much more pleasent than MATLAB!

Topic		Replies	Views
Matlab versus Julia General Usage	33	4784	July 15, 2021
How to accelerate matrix operations(multiplication, add, inverse) in a for loop? Performance performance , matlab	23	6656	September 2, 2018
Julia is significantly slower (~18 x) than Matlab in vector and matrix algebra New to Julia	32	1705	June 25, 2023
Matlab's matmul much faster than julia's New to Julia	6	666	April 5, 2024
Sparse matrix-vector product: much more slow than Matlab Performance matlab , optimization	24	4514	December 20, 2017

Benchmark MATLAB & Julia for Matrix Operations

Related topics