Sure, but op is just trying to say: I have this computer and I want to do this linear algebra: which is faster. Sure Julia could be made faster but so could matlab. I think matlab is a horrrible language to develop in though, so I stay in Julia camp, but if all I needed was to make a big matrix addition here and now, it looks like I should do it in matlab
Why say that when you can make a more refined statement: âit looks like I should do it in MATLAB, only because implicit parallelism makes it fasterâ. Itâs concise, true from the benchmarks I ran (turning multithreading off), and gives a solid development path / answer for Julia to work towards. The benchmarks just need one line to prove that point, and then itâs something which is useful. âI see a differenceâ isnât useful.
Can you provide a good reasoning for the âonlyâ? Thatâs the main point here. Matlab uses the underlying HW efficiently out-of-the-box, julia doesnât.
Take the opimized non-tempoary-allocating codes I wrote above for MatrixAdditionRunTime, MatrixMultiplicationRunTime, and ElementWiseOperationsRunTime. Benchmark those in Julia. Then take the MATLAB codes (optimized, since vectorized) and run them with:
maxNumCompThreads(1)
I saw a massive swing in the benchmarks (as already indicated) showing Julia far ahead, indicating that the confounding factor (at least in benchmarks related to these operations) is implicit parallelism.
The benchmark is supposed to show something. âJulia is slower in these benchmarksâ isnât an interesting or informative result. âJulia is slower in these benchmarks because MATLAB utilizes implicit parallelismâ is a very informative result and tells us that we wonât match MATLAB in these benchmarks until we use multithreading.
This doesnât mean only run it without multithreading, I am saying show both lines, MATLAB with implicit parallelism, and MATLAB without implicit parallelism. In at least some of the benchmarks, that will isolate whatâs up and narrow down the benchmarks which are still unexplained (I think those should be tested to see the difference of MKL vs OpenBLAS)
@ChrisRackauckas, Iâm a user.
I want to test what I will get as a user, not what the theoretical capabilities of X or Y.
We all agree that theoretically they can be as fast as can be a perfect C code.
But we try to measure what they bring to the table.
Hence I donât see the logic in limiting MATLAB to show Julia is better.
MATLAB JIT maybe was designed and optimizing with Multi Threading in mind.
By the way, are you sure your code is faster?
Pay attention, when there is code like:
mA = sqrt.(abs.(mX)) .+ sin.(mY);
mB = exp.(-(mA .^ 2));
Can not be replaced by one loop over both arrays.
No one promised those will always be one next to each other on code.
In the case of:
mB = exp.(-(mA .^ 2));
Will it be more optimized to run over the array for squaring and exponent?
Is there a Macro to make the loop with SIMD and Multi Threading?
It really doesnât matter. Split the loops if you want: youâll see it doesnât make much difference.
Base.@threads does the multithreading, @simd does SIMD. SIMD is usually automatic by the compiler. Iâve had issues with threading before due to the Int-boxing issue, but I havenât tried it in this case yet. Itâs still experimental, so YMMV.
You ran your test. It still tells nothing and isnât actionable. Disect the test and make it say something. I gave you a big glaring hint of how to show what the big difference is in two (three?) or more tests. The purpose of benchmarks is to learn what causes differences. Youâre a user, donât you want to know why the difference exists?
Hence Iâm adding now the Looping in the Julia Optimized test.
I also used nice feature of Julia (Doesnât exist in MATLAB) to define array without initializing it.
I ended up with:
tic();
# mD = abs.(mA) .+ sin.(mA);
mD = Array(Float64, matrixSize, matrixSize);
@simd for ii = 1:(matrixSize * matrixSize)
@inbounds mD[ii] = abs(mA[ii]) + sin(mA[ii]);
end
# mE = exp.(-(mA .^ 2));
mE = Array(Float64, matrixSize, matrixSize);
@simd for ii = 1:(matrixSize * matrixSize)
@inbounds mE[ii] = exp(- (mA[ii] * mA[ii]));
end
# mF = (-mB .+ sqrt.((mB .^ 2) .- (4 .* mA .* mC))) ./ (2 .* mA);
mF = Array(Float64, matrixSize, matrixSize);
@simd for ii = 1:(matrixSize * matrixSize)
@inbounds mF[ii] = (-mB[ii] + sqrt( (mB[ii] * mB[ii]) - (4 * mA[ii] * mC[ii]) )) ./ (2 * mA[ii]);
end
runTime = toq();
This only one case.
It seems @simd has no effect (As you mentioned, it is probably done by the Compiler to begin with).
The `` Macro improved only little.
Now, where should I put the âMulti Threadingâ Macro?
Please show.
Thank You.
Update
I updated the results of Optimized Julia by using Devectorization of some operations.
Itâs really great feature and it improves some operations.
What I really like is the consistency of Julia with its results.
MATLAB is far more inconsistent with its Run Time.
My conclusions so far:
Julia is doing its infant baby steps and those are impressive steps. There are some rough edges with the language (Less predictable than MATLAB regarding the work with Numerical Arrays in my opinion) yet the potential is clearly seen.
Julia Pro (The product, not the language) must improve its BLAS / LAPACK engine. Either by working with OpenBLAS to improve things or find a solution to Intel MKL (How come Anaconda ships with MKL? They do it free, can Julia Pro + MKL be free as well? I have zero knowledge about Open Software License System, just wonder whatâs the difference with Python that Anaconda pulled it off).
Julia must optimize its strength of the use of the . (Dot) operator. It is amazing feature. When implemented in its full prowess it will be a joy for the user.
Optimized broadcasting (Multi Threading, Is SIMD already there?).
Julia needs Multi Threading (Efficient one, with options set by the user).
User Control - I like the idea the user can take control over things (Using Macro for instance) instead of Heuristic. Hopefully it will be kept with Multi Threading as well (Able to set it On / OFF / Auto per job). I like the idea one can touch the bare bone (Even take off safety).
Consistency - Juliaâs performance are more consistent than MATLAB. Like it!
You keep saying this, but the numbers donât back it up. Imgur: The magic of the Internet shows very minor differences between MKL and OpenBLAS. You could point Matlab to use OpenBLAS as well (there used to be an environment variable you could set to do this, I donât know whether it still works or exactly how it was spelled) and would most likely see consistent results there too.
Anaconda doesnât come with GPL-licensed libraries for fast Fourier transforms or sparse linear algebra. If you want to do either of those in Python you donât have access to the libraries that are considered best-in-class. In Matlab youâre paying for the licensing cost of combining a commercial license for FFT or sparse linear algebra code (that would otherwise be GPL) with the closed-source MKL library. Distributing such a combination with the GPL versions of those libraries is not legal.
The option is available for paid commercial licensing through Julia Computing, but such commercial licensing agreements likely prohibit making such combinations available at zero cost. You can also build Julia with reduced functionality and delete the FFT and sparse linear algebra capabilities. Depending on how you obtain MKL, it might be legal to distribute a no-GPL build of Julia along with MKL. The open-source Julia project does not currently have an MKL license that allows us to do that as far as Iâm aware, but Intel or any other organization is welcome to reach out and make that happen.
Thanks for the numbers here. That puts the whole MKL thing to rest. I am quite surprised the difference is so negligible in most cases. Is there a reason why such a big deal is made out of it?
One interesting thing to test out in the same direction is whether using VML would affect any of the benchmarks in a big way:
@RoyiAvital, you do not have the numbers to show any of this. See @tkelmanâs way of lining up the different results under different conditions. Thatâs the only way to show these kinds of conclusions.
Something is wrong there (The numbers with MKL and OpenBLAS are too close, Look at Eigen Decomposition, they are the same).
For instance, it seems the people of OpenBlas knows their SVD is much slower than Intel MKLâs:
Yet your test doesnât show that in your numbers.
I really think you either ran there both using OpenBLAS or both using MKL.
If those numbers are right, it might be something about Julia interact with the BLAS / LAPACK Library (How come MATLAB extracts better performance form pure BLAS tests?).
@KristofferC was the one who ran those, Iâm just reposting the link here.
I blame the marketing value of the library being written by Intel (but Iâll note itâs developed by an entirely different team than the processors themselves), but we repeatedly hear âI want MKLâ without numbers showing if itâs actually beneficial. Iâve explained a number of ways you can get Julia with MKL if you want it so badly. If someone has a license that allows redistribution and use of their version of MKL with Julia, it would only be legal to distribute Julia including it if all GPL components were removed.
@tkelman, According to the results above, there is a strong case for wanting Intel MKL (I addressed the numbers by @KristofferC above).
Again,
I understand the legal issue and again I donât try to object the case.
I only want to show how Julia, with all choices made (For any reason, legal or other), compares to MATLABâs performance on this type of array work (Linear Algebra oriented).
Thatâs all into it.
I do think Julia is great.
Having those results in 0.5 release is impressive and there is a reason to be proud.
Also, OpenBLAS is young project as well. It will catch up and then things will be better.
Now, If I understand correctly your explanation for Intel MKL, because of inclusion of FFTW and the Sparse Library one can not ship Julia with MKL, am I right?
May I ask, if Julia dropped those 2, will it be able to ship with MKL?
If it does, give Intel MKLâs FFT is pretty good and also have nice Sparse Library support, would you change and take MKL instead of FFTW?
What I think should happen is to trust that OpenBLAS guys will improve with time, Julia will fix some of its weaknesses and weâll come back here on version 1.0.
It will be closer and then the MathWorks guys will need to raise their game as well.
I still havenât seen a recent apples-to-apples comparison. Compare the same operation in the same language where the only thing you change is the BLAS implementation, otherwise there are too many confounding factors to draw a reliable conclusion.
MKL has a sparse matrix library but itâs not a drop in replacement for SuiteSparse, the API is entirely different.
I also donât think itâs appropriate for an open source project to depend so highly on a closed source component that restricts who can build a working version of the code and how they can use it. Julia the open source project needs to support non-Intel processors and architectures (AMD, ARM, Power), for which MKL isnât available.
In open source, if you want something to happen and donât want to pay for it, youâll need to participate and contribute towards making it happen (or be quietly patient, or technically convincing enough that someone else does it for you - this last option is very rare however). There is an open issue for moving FFTW out of the default standard library distribution of Julia and into a package instead. SuiteSparse should similarly move to a package. Contributions welcome to help those get done faster.
Just as a point of reference, in case other people reading this thread get the wrong idea: I have transitioned a large code base from MATLAB to Julia and have observed enormous speed ups using Julia, along with a much better code organization and maintainability. Granted I do not use any linear algebra in my code (and I get slightly annoyed when people equate scientific computing = linear algebra, although I do acknowledge it is common), my code is mostly FFT and ODE solving on huge systems of equations. Indeed I have also transitioned from some heavy C++ codes (which I wrote originally to get passed the poor performance of MATLAB) and I get similar performance in Julia compared to C++ when I optimized somewhat.
So the language itself I know is far better performance than MATLAB. Perhaps some of the applied algorithms are less optimized at the moment, but please do not get the wrong impression from this thread, which is focused on just one aspect of scientific computing.
In addition Julia the language is just much more pleasent than MATLAB!