Matlab versus Julia

Hi guys, I am new to Julia. I have been using Matlab for many years. I have been told that Julia can offer great improvements in terms of performance. After doing a very preliminary intro course on Julia I started hacking around and to try to understand how to write code that is actually faster. So far all the benchmark attempts I have tried failed miserably, in the sense that Matlab is always much much faster. Obviously I must be doing something wrong. Can anybody give some tips?

For example, I thought Julia would be much faster at running loops. So I tried the following in both Matlab and Julia (on the same computer)

aa = zeros(Float32,100);
bb = ones(Float32,100);
cc = ones(Float32,100);
using TickTock
tick()
for i = 1:100
for j = 1:100
for k = 1:100
cc = aa.*bb;
end
end
end
t1 = tock()

The Julia code, which I am running on Atom, runs in about 0.8 secs. The equivalent Matlab code runs in 0.1 secs. What am I missing?

thanks so much

See: Performance Tips · The Julia Language

The very first tip in particular.

In this example, do:

2 Likes

I think there are some typos vccc and testtest! But if matlab can do that in 100ms something’s still wrong, it takes 700ms for me.

1 Like

First of all, thanks for taking the time. And yes, I had caught the typos. Easy fix
However, now this takes 5 seconds to run, as opposed to 0.8. Is that what happens on your computer as well?

1 Like

Make sure you’re using BenchmarkTools @btime to get a good measurement. Julia’s function calls are always slow on their first execution because they are being compiled.

@asaretto can you show your matlab code?

Sorry for the typos. I am writing from my cell phone, so I cannot test it here. Can you post exactly your code and the Matlab code for us to see if they are doing the same?

That should absolutely not take 5s.

Matlab code:

aa = zeros(100,1);
bb = ones(100,1);
cc = ones(100,1);
tic
for i = 1:100
for j = 1:100
for k = 1:100
cc = aa.*bb;
end
end
end
t1 = toc

Julia code:

Blockquote

using BenchmarkTools
aa = zeros(Int16,100);
bb = ones(Int16,100);
cc = ones(Int16,100);
function test(as,bb,vc)
for i = 1:100
for j = 1:100
for k = 1:100
cc = aa.*bb;
end
end
end
end
@btime test($aa,$bb,$cc)

Blockquote

Now it says it takes 600ms, but it takes more than that to print the answer on the screen(!?)

1 Like

The dot there means the same thing as in Julia?

Ps. Use better triple back ticks to quote the code.

From what (little) I understand the element by element multiplication in Matlab is the same as the broadcasting in Julia

Try

cc .= aa.*bb

This should avoid some allocations.

2 Likes

There was yet another typo. This is what you should get:

julia> aa = zeros(Float32,100);

julia> bb = ones(Float32,100);

julia> cc = ones(Float32,100);

julia> function test!(aa,bb,cc)
         for i = 1:100
           for j = 1:100
             for k = 1:100
               cc .= aa.*bb;
             end
           end
         end
       end
test! (generic function with 1 method)

julia> using BenchmarkTools

julia> @btime test!($aa,$bb,$cc)
  15.543 ms (0 allocations: 0 bytes)

julia> 

Note a second . in cc .= aa .* bb , meaning that you are updating cc and not creating a new one at every iteration. (this can be written also as @. cc = aa * bb).

This is because @btime executes the function many times to obtain an accurate measure of the performance. Also, the function, here test! is compiled in the first call, so you get an innacurate measure of its performance by measuring only one call of it, if it is a fast function. For example, starting from a fresh section:

julia> aa = rand(Float32,100);

julia> bb = rand(Float32,100);

julia> cc = ones(Float32,100);

julia> function test!(aa,bb,cc)
         for i = 1:100
           for j = 1:100
             for k = 1:100
               cc .= aa .* bb;
             end
           end
         end
       end
test! (generic function with 1 method)

julia> @time test!(aa,bb,cc) # first run
  0.100118 seconds (221.07 k allocations: 12.956 MiB, 9.55% gc time, 83.58% compilation time)

julia> @time test!(aa,bb,cc) # second and following runs
  0.025998 seconds


This is using the @time macro. The @btime macro does that multiple running for you, and reports the minimum time obtained (which is a more stable measure of performance).

4 Likes

I would suggest that you try out a „real life“ example from your everyday work. That will be much more helpful than some generic example (the one being shown here does not seem very interesting/meaningful to me)

1 Like

Note also that comparing performance for a single vectorized operation is pretty uninteresting and unlikely to show much in the way of performance gains. See also Comparing Numba and Julia for a complex matrix computation - #3 by stevengj

7 Likes

There is one considerable difference relative to what you are getting, which might be one source of problems. When I run the very last part (the benchmark) I get the following:

julia> @btime test!($aa,$bb,$cc)
551.124 ms (2000000 allocations: 61.04 MiB)

Aside from the time that can be machine specific, it looks like you have 0 allocations, while I have 2M. Is that a problem?

There is a typo there: as should be aa (my fault on the original post).

That above. aa was being considered a global variable inside the function because the name of the parameter was wrong. (I figured it readily when I noticed those allocations, that shouldn’t be there).

(and the broadcasting of setting cc, the first dot I mentioned there, cc .= aa .* bb).

That did make it faster. now it takes 125ms on my laptop, which is at least about the same as Matlab.

Ok, say that I want to try something more complicated. The type of staff that I do is solve/calibrate numerical models whose solution is essentially the solution to a fix point problem.

How should I think about Julia in terms of the things it does more efficiently? Loops always? No vectorizations? Is there a must read book/article/blog?

thanks again guys, I really appreciate it

4 Likes

I think it is easier to say where you should not expect a significant gain: if the expensive part of the Matlab code is a call to a library which is optimized in a lower level language, the Julia code (which may be eventually a call to the same library, many times) will be equally fast (or slow). Other times a pure Julia library for the same task, or your implementation, adapted to your problem, can be faster.

Julia can be efficient as a low level language as C++ or Fortran, not more, not less.

(Ps. Probably you are still missing the dot or the fixed typo on as to get that timing. As a good practice it is always nice to post the last code, so others can suggest improvements and fixes. If you have a slightly more realistic example, from there there are other possible optimizations and learning to parallelize the loops, that are worth exploring).

I also have some notes, which I write as I learn: Home · JuliaNotes.jl

3 Likes

I think it is possible that Matlab realized that the result was not dependent on the index i, j, k, and collapsed the whole looping into a single multiplication of matrices.

Edit: This does not happen in Matlab. I checked.

1 Like