Matlab versus Julia

No, not one typo. Two typos: as and vc.

It’s really important to get rid of all the typos.

Also, @asaretto, this is what Matlab was made for. There should not be a big difference between Matlab and Julia on this.

What’s more: this just runs the same simple operation over and over. You should leave those repetitions to your benchmarking tool, don’t do it manually. At worst, the compiler may even decide that it only needs to run one single iteration instead of 100^3, and return a bonkers result.

1 Like

Explicit loops like those are fast in Matlab? (In some specific context or in general?)

Because compilation wasn’t fully explained earlier in the thread, I’ll go over it in a little bit more detail. The first time you run a function, Julia takes a while because it’s translating it into a well-optimized binary that your computer can read. Matlab translates this function while running it, with very little optimization. This means you don’t have to wait for it to get translated, but any future function calls will be much slower than Julia function calls (assuming identically-written programs).

(Something something this is a lie to children, but it gets the rough point across.)

Yes, Matlab has made quite big improvements in the JIT compiler, roughly 2015-ish. There was a big jump in performance for certain kinds of loops that used to be glacially slow. Element wise computations for example.

3 Likes

Matlab is great. But you pay. Then you pay again. Next, you pay. Then more paying. Finally, you pay.

2 Likes

It is a one time payment, paid in annual installments. And after you paid for it, it is entirely free.

4 Likes

Not really. I rented Matlab + addons for 1800 EUR per year for a couple of years, now I cannot use it any more.
They have different type of licenses.

1 Like

Sorry, 't was a joke (“borrowed” from Portlandia).

3 Likes

Oh, wow, I didn’t know Matlab was compiled – I thought it was interpreted. Why were loops so slow in the first place then?

Because it’s not just a matter of hooking up a compiler. The semantics of Matlab provide much less information to the compiler than Julia, or any static language, so it’s not always possible to produce efficient compiled code. Retro-fitting a compiler onto a language that was not designed for it is hard. (See also Python, which has seen many attempts at compilers.)

That’s why Julia needed a new language, and not just a new compiler. (This is a FAQ. See also this MIT lecture.)

8 Likes

Also, matlab’s JIT is relatively new 2015 or later.

1 Like

Well, actually some JIT compilation capability was introduced to Matlab much earlier than that – as early as in 2002 (Matlab 6.5):

MATLAB has employed JIT compilation since release R13 in 2002. The original JIT compiler was designed as a supplement to the MATLAB interpreter, and supported a limited subset of the language…

A bit late to the party, but I like optimizing nested loops in Julia. First, let’s make a version that actually does something distinct on each iteration rather than overwriting the same cc:

aa = rand(Float32,100)
bb = rand(Float32,100)
cc = zeros(Float32,100)
function test!(cc,aa,bb) # Normally the first argument is the mutated one
    for i = 1:100
        for j = 1:100
            for k = 1:100
                cc[i] += aa[j] * bb[k]
            end
        end
    end
    return cc # Optional, but doesn't cost anything
end

For benchmarking, I heavily recommend something that will give you a histogram rather than a single number. For example:

julia> using BenchmarkTools

julia> @benchmark test!($cc,$aa,$bb)
BechmarkTools.Trial: 1757 samples with 1 evaluations.
 Range (min … max):  2.435 ms …   5.037 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     2.671 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.836 ms ± 445.603 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▄█▅▇▆▄▆▂▁▁                                                   
  ██████████▇▅▆▄▅▄▅▄▄▄▃▃▃▃▄▃▃▃▃▃▃▃▃▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▃▂▂▂▁▂▂▂▂ ▄
  2.44 ms         Histogram: frequency by time        4.54 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

Note that if benchmarking in global scope (like the REPL), you need to use $ to interpolate global variables into the benchmark.

This isn’t terrible, and there’s no allocations which is good, but one can get much faster. In particular, by using your CPU’s SIMD vector registers and instructions, which for this example you can easily do with the amazing LoopVectorization.jl

using LoopVectorization
function test_lv!(cc,aa,bb) # Normally the first argument is the mutated one
    @turbo for i = 1:100
        for j = 1:100
            for k = 1:100
                cc[i] += aa[j] * bb[k]
            end
        end
    end
    return cc # Optional, but doesn't cost anything
end

which gives us

julia> @benchmark test_lv!($cc,$aa,$bb)
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  106.252 μs … 453.916 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     106.263 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   110.554 μs ±  16.103 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▁  ▃      ▃        ▃                                       ▁ ▁
  ██▄▄█▆▇▅▅▆▅█▆▆▄▅▆▅▄▃█▅▆▅▅▅▅▃▄▄▄▅▁▁▃▃▄▃▃▄▁▁▁▃▁▁▃▃▃▁▄▃▁▄▁▁▁▃▁▁█ █
  106 μs        Histogram: log(frequency) by time        206 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

That’s more than a 20x speedup, and would likely be even better on a computer with AVX512 (I only have AVX2). That’s a bit more of a speedup than typical for LV for (e.g.) Float64s, but you can fit 2x as many Float32s in a given vector register, so SIMD vectorization may matter more for smaller number types.

Finally, since Matlab tries to multithread by default, why don’t we do the same to level the playing field (it just takes an extra t in the macro name)

function test_lvt!(cc,aa,bb) # Normally the first argument is the mutated one
    @tturbo for i = 1:100
        for j = 1:100
            for k = 1:100
                cc[i] += aa[j] * bb[k];
            end
        end
    end
    return cc # Optional, but doesn't cost anything
end

This gives about another 4x speedup (on a 4 core CPU) despite the small problem size because of the very lightweight threading (from Polyester.jl) used by by @tturbo.

julia> @benchmark test_lvt!($cc,$aa,$bb)
BechmarkTools.Trial: 10000 samples with 1 evaluations.
 Range (min … max):  26.804 μs … 96.401 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     26.919 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   27.389 μs ±  2.680 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▄▁     ▁                                                   ▁
  ███▆▅▇▇▇█▆▅▄▆▄▅▄▄▄▃▃▁▁▃▄▄▄▁▄▅▅▅▆▆▇▆▆▆▆▄▅▅▆▄▅▄▄▃▃▄▄▄▅▆▄▄▄▄▅▅ █
  26.8 μs      Histogram: log(frequency) by time        40 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.
10 Likes

Since you mention coming from Matlab @asaretto, my own two cents after making the same transition: GitHub - brenhinkeller/JuliaAdviceForMatlabProgrammers: Learning to love dispatch-oriented programming

You can also feel free to ask julia-matlab questions or start discussions in the #matlabsconders channels on Slack or Zulip (pretty quiet channels, but will have a few who have made the transition lurking)

5 Likes