Benchmark MATLAB & Julia for Matrix Operations

RoyiAvital · February 9, 2017, 12:50pm

Hi,

My first try with Julia was to compare its speed on Matrix Operations (Linear Algebra oriented) to MATLAB.

As any test, it is far from being perfect, but it is telling something about the picture.

Take this benchmark in its context, it’s not about which is faster in general but which is faster in the context of Matrix and Linear Algebra operations.

Few remarks:

This is my first time coding with Julia, I’d be happy to hear if I did something wrong and results could be much improved (This is the reason I can’t make any conclusions).
Ideas of more Matrix / Linear Algebra oriented tests are welcome.

For me curiosity is the fuel for great voyage, so I really liked trying and experiencing new things.

P. S.
This all started here:
https://github.com/JuliaLang/julia/issues/18374

dpsanders · February 9, 2017, 12:53pm

Hi,

Welcome to Julia!

In Julia, you should never benchmark in “global scope” – your benchmarking code should be run inside a function; please see the performance tips.

ChrisRackauckas · February 9, 2017, 12:54pm

A few comments:

Dot fusion won’t happen with operators like .* on v0.5. So the version you are using doesn’t have it. This means

mA = (scalarA .* mX) .+ (scalarB .* mY)

will be version inefficient, and so you should write the loop instead:

mA = similar(mX)
for i in eachindex(mX)
  mA[i] = (scalarA * mX[i]) + (scalarB * mY[i])
end

If you use v0.6, you’ll get very different results with . operators, because it’s on that version (un-released) where fusion occurs.

Also, never benchmark in the global scope.

RoyiAvital · February 9, 2017, 12:56pm

@ChrisRackauckas, @dpsanders,

Could you tell me what to change in the Main script of Julia in order to run the benchmark not from the scope?

@ChrisRackauckas,
I will try your suggestion.

Thank You.

dpsanders · February 9, 2017, 12:57pm

Make a function containing the benchmarking code, and then call that function.

dpsanders · February 9, 2017, 1:00pm

It is also preferred to use @elapsed f(x) to capture the time taken to run the function f with argument x, or better to use the BenchmarkTools.jl package: GitHub - JuliaCI/BenchmarkTools.jl: A benchmarking framework for the Julia language

ChrisRackauckas · February 9, 2017, 1:01pm

Another thing to note is that MATLAB’s . operators are multi-threaded by default, so a more fair comparison is:

mA = similar(mX)
Base.@threads for i in eachindex(mX)
  mA[i] = (scalarA * mX[i]) + (scalarB * mY[i])
end

but you have to make sure you set JULIA_NUM_THREADS before opening Julia. Threading is still experimental, so it’ll probably be added as a default to broadcasting later. To see that this is the case, do a big A.*B and you’ll see MATLAB use all of the cores on your machine. We’ll get there.

Also, MATLAB uses Intel MKL by default, while Julia uses OpenBLAS. You can build Julia with MKL, but it’s not the default. On many machines, MKL will be faster. That accounts for the difference in things like matrix multiplications. An MKL build for JuliaPro is planned (see the JuliaPro page).

Lastly, the vectorized functions that are being called are highly optimized C/FORTRAN kernals in MATLAB. In general, Julia won’t be faster than some “built-in” function of MATLAB/Mathematica/Python/R, etc. since those are usually not written in the scripting language. The Julia code should, with all of the things mentioned, end up compiling down to about the same code, so I wouldn’t be surprised if after swapping out for MKL and all of that you see little to no difference between the two. Where Julia will shine is when there isn’t a built-in function, and you have to compare the algorithm fully written out, gory loops and all.

mkborregaard · February 9, 2017, 1:10pm

I hope you will update with the results after implementing these suggestions

stevengj · February 9, 2017, 1:16pm

I usually recommend understanding serial (non-parallel) performance first, before messing around with multi-threading. In Matlab, if you start with -singleCompThread then it will only use a single thread.

RoyiAvital · February 9, 2017, 1:20pm

@ChrisRackauckas, Would it be easier to move to Julia 0.6?
Then those operation will be fused and will be ran in a Multi Threaded manner, right?

@mkborregaard, I will update once I have a full picture. Running this test is a round 15 Minutes (Which in this time I close all applications and don’t touch the computer). So I can’t do that more than once a day or so.

Thank You.

stevengj · February 9, 2017, 1:20pm

Note that benchmarking O(n^3) matrix operations on hardware floating-point types, like inversion or matrix multiplication, is generally not a useful way to compare languages. The reason is that essentially every language (Matlab, Julia, Python+SciPy/NumPy, R) uses the same LAPACK library for these operations, the operations are expensive enough (except for tiny matrices) that language-calling overhead is irrelevant, and all that matters is what underlying BLAS library is being linked.

RoyiAvital · February 9, 2017, 1:23pm

@stevengj, As I wrote above, all started from this issue in Julia:

https://github.com/JuliaLang/julia/issues/18374

I was told there that OpenBLAS will be similar to MKL in Performance.

I totally agree with you.
But those tests are composed of operation many Signal / Image / Data Processing engineers do hence the performance of them is important.

As I wrote above, anyone can have there own conclusions.
I just liked to create the test (Experience Julia for the first time) and shared the data.

Thank You.

ChrisRackauckas · February 9, 2017, 1:24pm

I would not advise any beginner going to v0.6. It’s the “still in development” branch and you’ll run into some wild things every once in awhile. Just stick to v0.5 and loop.

Also, multithreaded broadcasts are not a part of v0.6.

RoyiAvital · February 9, 2017, 1:33pm

@ChrisRackauckas, I added it to the ToDo List.
I will create another version of Julia test with Devectorized Element Wise Operations as you suggested (Thought it was already working on the version I tried).

Hopefully Multi Threaded Broadcasts + Loop Fusion will be in shortly.

Any idea for more tests?

Thank You.

ChrisRackauckas · February 9, 2017, 1:39pm

Loop fusion is pretty close to being released:

As for multi-threading it, I just bumped the issue:

felix · February 9, 2017, 5:05pm

I think this is brilliant. Absolutely smashing thing to do, well done OP I’ve been curious about these numbers for a while, and I think this is an ace thing to monitor and target!

Please could you add the square root function sqrtm to your benchmarks as well? I’m really interested in this one the latest Matlab has a nice block-recursive version of this, while Julia still uses a pointwise approach. So I expect Matlab to smash it for now – but we’ll get there!

It’s interesting that the “obvious” code to write for benchmarking is not representative of the power of Julia. At the moment some more involved things have to be written to bring it to parity with Matlab (for example, multi-threading), one has to be aware of a number of factors. I hope that we can take the edge off that in the future.

RoyiAvital · February 9, 2017, 5:17pm

@felix,

I think your spirit is the right one.
Embracing competition on the path for success.

Hopefully Julia will get better and then MATLAB will get better and so on…

I’m adding sqrtm() and expm() to the ToDo list.

I will update the benchmark results soon.

Thank You.

ChrisRackauckas · February 9, 2017, 5:37pm

I wouldn’t think this was the obvious code to benchmark, but I think that really highlights the difference between users and developers. If you think of the language as the basic tools which you are given writing code, then Julia is clearly faster. This is when you’re testing things like “how fast is recursion” and “how fast is looping”, and is what the Julia benchmarks show.

But to users, the language also includes implementations of various algorithms, i.e. packages. Note that the linear algebra “core” is essentially a package, and will move out of Julia Base and into a “standard library” when there is a good mechanism for default packages:

github.com/JuliaLang/julia

mechanism for default packages

opened 01:40PM - 05 Oct 16 UTC

closed 08:58PM - 12 Oct 17 UTC

StefanKarpinski

packages design modules excision stdlib

Discussion related to https://github.com/JuliaLang/julia/issues/5155. There are …a few steps needed to break up Base: 1. Move functionality into modules, some of this work is are done already. 2. Move modules out of Base, into a location where they can be loaded via `LOAD_PATH`. 3. Have a mechanism for installing, finding, and updating these "pseudo-packages". It is this last point, the pseudo-package mechanism, which, I suspect, blocks the whole process. Considerations: - Base should be independent of the pseudo-packages: it should be possible to build, load, run and test all of base Julia without any pseudo-packages loaded or even installed. - Pseudo-packages should be largely independent of each other: it should be possible to load and test each without all the others. Should we allow any amount of interdependency between pseudo-packages? - It should be possible to update pseudo-packages independently of Julia, but it is acceptable for pseudo-packages to be considerably more coupled with Julia than normal packages. - There needs to be a mechanism for installing and updating pseudo-packages that is not normal package installation, because we want the package manager itself to be a pseudo-package.

It gets a little weird because in things like Mathematica and MATLAB these aren’t “packages” which you have to download, but “built into the language” because that’s how it’s sold.

However, testing this and calling it a test of “Julia” is confounding the difference between the core language internals for implementing algorithms, and the current implementation of algorithms. I don’t think most experienced people will be surprised that MATLAB (and Mathematica) have optimized the functions that they give you to death: they have a billion dollar workforce working day and night working on those, and they’re written in C/FORTRAN. If you stick to the built-in functions of those languages, you won’t just generally do fine, you will tend to do really well.

But that’s why I have also written elsewhere that what’s compelling about Julia is that, the answer to the packages and functions which other people/languages have to write in C/FORTRAN, we are able to write in Julia. Thus it is much easier/quicker to build a fast Julia package with the latest features and latest algorithms. For this reason, I think Julia itself is much more compelling of a language for developers, not users, and I tend to think that users will come because of the packages which are made.

All of that said… I’m still heavily surprised there’s that much of a difference in the linear algebra, and think that most of it will go away when the problems noted above are fixed (mostly: do not time in the global scope, time twice (to get rid of compilation), and maybe switch to MKL). I wouldn’t expect Julia to be faster, just very close, since again, they are likely just calling the same C/FORTRAN BLAS/LAPACK library in this case.

[Final note: there are a few places where the language being slow will cause performance issues, even when it’s mostly using built-in functions. Two good examples are user-defined functions and heavy vectorization. If a built-in function has to use a user-defined function, then it’s calling out to the actual language instead of C/FORTRAN. If most of the time for this kind of algorithm is spent in this function (for example with optimization, differential equation, integration routines), then Julia will have an advantage. Also, if heavy vectorization is employed, then Julia will have an advantage because of loop fusion. Those are some things to think about, which I don’t think show up to a large extent in these tests.]

RoyiAvital · February 9, 2017, 6:23pm

@ChrisRackauckas,

I think your post is well written.
I agree with most.

Yet the conclusion should be fix it in Julia.
Julia started as a Scientific Computing Language just MATLAB hence it should be assumed users will use it similarly.
The functions above are the basic Lego of Scientific Computing hence it must be really good in those.

Moreover, I think MATLAB has improved greatly in its general performance in R2016a and R2016b.

But I’m not worried, one must remember Julia is doing its first step and those are impressive steps.
It already pushed MATLAB to greatly improve its JIT Engine.

About the performance, what you see now on site are results of a run out of the global scope as you suggested.

@felix, I added few benches (You asked) and I will run the test soon.
But probably as it will get larger my computer won’t do (I need it and doing the benchmark requires me to leave it ).

I have few ideas to improve its general run time (Too many creations of Matrices) and probably will break it into few different tests.

ChrisRackauckas · February 9, 2017, 7:00pm

I put some PRs in which fix up some of the broadcasting tests, and disables MATLAB’s implicit multithreading. On my computer, the results flip wildly in the direction of Julia, indicating that this was the confounding factor in these tests. FWIW, this was on a 4 core machine and MATLAB got closer to 8x of a speedup from threading. Not sure how.

Topic		Replies	Views
Any benchmark of Julia v1.0 vs older versions Performance	66	8115	April 3, 2019
Julia is significantly slower (~18 x) than Matlab in vector and matrix algebra New to Julia	32	1896	June 25, 2023
Why is this simple function twice as slow as its Python version Performance question	97	4432	April 12, 2021
Why is this Julia code considerably slower than Matlab New to Julia performance	64	8400	March 5, 2017
Matlab versus Julia General Usage	33	4946	July 15, 2021

Benchmark MATLAB & Julia for Matrix Operations

Related topics