Another thing to note is that MATLAB’s . operators are multi-threaded by default, so a more fair comparison is:
mA = similar(mX)
Base.@threads for i in eachindex(mX)
mA[i] = (scalarA * mX[i]) + (scalarB * mY[i])
but you have to make sure you set JULIA_NUM_THREADS before opening Julia. Threading is still experimental, so it’ll probably be added as a default to broadcasting later. To see that this is the case, do a big A.*B and you’ll see MATLAB use all of the cores on your machine. We’ll get there.
Also, MATLAB uses Intel MKL by default, while Julia uses OpenBLAS. You can build Julia with MKL, but it’s not the default. On many machines, MKL will be faster. That accounts for the difference in things like matrix multiplications. An MKL build for JuliaPro is planned (see the JuliaPro page).
Lastly, the vectorized functions that are being called are highly optimized C/FORTRAN kernals in MATLAB. In general, Julia won’t be faster than some “built-in” function of MATLAB/Mathematica/Python/R, etc. since those are usually not written in the scripting language. The Julia code should, with all of the things mentioned, end up compiling down to about the same code, so I wouldn’t be surprised if after swapping out for MKL and all of that you see little to no difference between the two. Where Julia will shine is when there isn’t a built-in function, and you have to compare the algorithm fully written out, gory loops and all.
I usually recommend understanding serial (non-parallel) performance first, before messing around with multi-threading. In Matlab, if you start with -singleCompThread then it will only use a single thread.
@ChrisRackauckas, Would it be easier to move to Julia 0.6?
Then those operation will be fused and will be ran in a Multi Threaded manner, right?
@mkborregaard, I will update once I have a full picture. Running this test is a round 15 Minutes (Which in this time I close all applications and don’t touch the computer). So I can’t do that more than once a day or so.
Note that benchmarking O(n^3) matrix operations on hardware floating-point types, like inversion or matrix multiplication, is generally not a useful way to compare languages. The reason is that essentially every language (Matlab, Julia, Python+SciPy/NumPy, R) uses the same LAPACK library for these operations, the operations are expensive enough (except for tiny matrices) that language-calling overhead is irrelevant, and all that matters is what underlying BLAS library is being linked.
@ChrisRackauckas, I added it to the ToDo List.
I will create another version of Julia test with Devectorized Element Wise Operations as you suggested (Thought it was already working on the version I tried).
Hopefully Multi Threaded Broadcasts + Loop Fusion will be in shortly.
I think this is brilliant. Absolutely smashing thing to do, well done OP I’ve been curious about these numbers for a while, and I think this is an ace thing to monitor and target!
Please could you add the square root function sqrtm to your benchmarks as well? I’m really interested in this one the latest Matlab has a nice block-recursive version of this, while Julia still uses a pointwise approach. So I expect Matlab to smash it for now – but we’ll get there!
It’s interesting that the “obvious” code to write for benchmarking is not representative of the power of Julia. At the moment some more involved things have to be written to bring it to parity with Matlab (for example, multi-threading), one has to be aware of a number of factors. I hope that we can take the edge off that in the future.
I wouldn’t think this was the obvious code to benchmark, but I think that really highlights the difference between users and developers. If you think of the language as the basic tools which you are given writing code, then Julia is clearly faster. This is when you’re testing things like “how fast is recursion” and “how fast is looping”, and is what the Julia benchmarks show.
But to users, the language also includes implementations of various algorithms, i.e. packages. Note that the linear algebra “core” is essentially a package, and will move out of Julia Base and into a “standard library” when there is a good mechanism for default packages:
It gets a little weird because in things like Mathematica and MATLAB these aren’t “packages” which you have to download, but “built into the language” because that’s how it’s sold.
However, testing this and calling it a test of “Julia” is confounding the difference between the core language internals for implementing algorithms, and the current implementation of algorithms. I don’t think most experienced people will be surprised that MATLAB (and Mathematica) have optimized the functions that they give you to death: they have a billion dollar workforce working day and night working on those, and they’re written in C/FORTRAN. If you stick to the built-in functions of those languages, you won’t just generally do fine, you will tend to do really well.
But that’s why I have also written elsewhere that what’s compelling about Julia is that, the answer to the packages and functions which other people/languages have to write in C/FORTRAN, we are able to write in Julia. Thus it is much easier/quicker to build a fast Julia package with the latest features and latest algorithms. For this reason, I think Julia itself is much more compelling of a language for developers, not users, and I tend to think that users will come because of the packages which are made.
All of that said… I’m still heavily surprised there’s that much of a difference in the linear algebra, and think that most of it will go away when the problems noted above are fixed (mostly: do not time in the global scope, time twice (to get rid of compilation), and maybe switch to MKL). I wouldn’t expect Julia to be faster, just very close, since again, they are likely just calling the same C/FORTRAN BLAS/LAPACK library in this case.
[Final note: there are a few places where the language being slow will cause performance issues, even when it’s mostly using built-in functions. Two good examples are user-defined functions and heavy vectorization. If a built-in function has to use a user-defined function, then it’s calling out to the actual language instead of C/FORTRAN. If most of the time for this kind of algorithm is spent in this function (for example with optimization, differential equation, integration routines), then Julia will have an advantage. Also, if heavy vectorization is employed, then Julia will have an advantage because of loop fusion. Those are some things to think about, which I don’t think show up to a large extent in these tests.]
I think your post is well written.
I agree with most.
Yet the conclusion should be fix it in Julia.
Julia started as a Scientific Computing Language just MATLAB hence it should be assumed users will use it similarly.
The functions above are the basic Lego of Scientific Computing hence it must be really good in those.
Moreover, I think MATLAB has improved greatly in its general performance in R2016a and R2016b.
But I’m not worried, one must remember Julia is doing its first step and those are impressive steps.
It already pushed MATLAB to greatly improve its JIT Engine.
About the performance, what you see now on site are results of a run out of the global scope as you suggested.
@felix, I added few benches (You asked) and I will run the test soon.
But probably as it will get larger my computer won’t do (I need it and doing the benchmark requires me to leave it ).
I have few ideas to improve its general run time (Too many creations of Matrices) and probably will break it into few different tests.
I put some PRs in which fix up some of the broadcasting tests, and disables MATLAB’s implicit multithreading. On my computer, the results flip wildly in the direction of Julia, indicating that this was the confounding factor in these tests. FWIW, this was on a 4 core machine and MATLAB got closer to 8x of a speedup from threading. Not sure how.