Why `Mojo` can be so fast?

Mojo is a new language, this is the introduction of it.
I am curious about its performance, so I made a test for it.
In the Jupyter supplied by the Mojo:

%%python
import time
def fib(n):
    if n == 0 or n == 1:
        return n
    elif n == 2:
        return 1
    else:
        return fib(n - 1) + fib(n - 2)
time_start = time.time()
fib(40)
time_end = time.time()
print(time_end - time_start)
18.047149419784546

#----another jupyter cell, this is mojo code----#
def fib_mojo(n: Int) -> Int:
    if n == 0 or n == 1:
        return n
    elif n == 2:
        return 1
    else:
        return fib_mojo(n - 1) + fib_mojo(n - 2)
time_start = time.time()
fib_mojo(40)
time_end = time.time()
print(time_end.to_f64() - time_start.to_f64())
0.483159

In my local computer, the same python code:

import time
def fib(n):
    if n == 0 or n == 1:
        return n
    elif n == 2:
        return 1
    else:
        return fib(n - 1) + fib(n - 2)
time_start = time.time()
fib(40)
time_end = time.time()
print(time_end - time_start)
13.455525398254395

And the Julia code:

fib(n::Int)::Int = begin
    if n == 0 || n == 1
        return n
    elseif n == 2 
        return 1
    else
        return fib(n - 1) + fib(n - 2)
    end
end
0.267103 seconds

So in the local computer, the Mojo performance would be: 0.48 * 13.455 / 18.047 = 0.35.
I also test the same code in Rust:

fn fib(n: i64) -> i64 {
    if n == 0 || n == 1 {
        return n
    } else if n == 2 {
        return 1
    } else {
        return fib(n - 1) + fib(n - 2)
    }
}
210.7517ms

In summary:

Julia 0.26s
Rust 0.21s
Python 18s
Mojo 0.35s

Mojo is almost as fast as Julia and Rust, what is the magic behind it?

2 Likes

Why not make Julia better?

We think Julia is a great language and it has a wonderful community, but Mojo is completely different. While Julia and Mojo might share some goals and look similar as an easy-to-use and high-performance alternative to Python, we’re taking a completely different approach to building Mojo. Notably, Mojo is Python-first and doesn’t require existing Python developers to learn a new syntax.

Mojo also has a bunch of technical advancements compared to Julia, simply because Mojo is newer and we’ve been able to learn from Julia (and from Swift, Rust, C++ and many others that came before us). For example, Mojo takes a different approach to memory ownership and memory management, it scales down to smaller envelopes, and is designed with AI and MLIR-first principles (though Mojo is not only for AI).

That said, we also believe there’s plenty of room for many languages and this isn’t an OR proposition. If you use and love Julia, that’s great! We’d love for you to try Mojo and if you find it useful, then that’s great too.
…

How does Mojo provide a 35,000x speed-up over Python?

Modern CPUs are surprisingly complex and diverse, but Mojo enables systems-level optimizations and flexibility that unlock the features of any device in a way that Python cannot.

3 Likes

Although in my personal view, Julia’s syntax is much prettier than python, Mojo is a strong competitor of Julia.

5 Likes

Your Rust example is using 32 bit integers while the Julia one is 64.

6 Likes

Yes, I correct it, then Rust is a little faster than Julia.

10-loc benchmarks are the silliest game in town, but if someone insists, here is what I think is the fastest Julia version:

Compilation is even faster in 1.9 :stuck_out_tongue:

With respect, I think that questions about the internals and design of Mojo should be asked on the Mojo forum (if there is one, didn’t check).

13 Likes

@Tamas_Papp, are saying that the micro benchmarks on that page are silly? ( Julia Micro-Benchmarks ) Most of them are ~10loc.
Interestingly on my computer (Intel Xeon @3.1GHz), the examples of @Brian1 give me the following:
Rust 1.67.1: 186.68 ms
Julia 1.9 rc3 350.608 ms (using benchmarktools)
Julia commit 4e782bf91a 1.10 dev 961.276 ms
I know I shouldn’t use Julia 1.10 but I was thinking it might help spot a regression?

4 Likes

You can’t assume the ratio of timings between 2 different languages is constant across different systems. gitboy16’s timings is proof of that.

2 Likes

Nothing magical, it certainly compiles specialized code to the types of arguments. Note that for such a simple benchmark you can get that function to be fast with standard python acceleration tools:

In [11]: from numba import jit

In [12]: @jit(nopython=True)
    ...: def fib(n):
    ...:     if n == 0 or n == 1:
    ...:         return n
    ...:     elif n == 2:
    ...:         return 1
    ...:     else:
    ...:         return fib(n - 1) + fib(n - 2)
    ...: 

In [13]: %timeit fib(40)
296 ms ± 1.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which here ran actually faster than the julia code sometimes:

julia> @btime fib(40)
  350.784 ms (0 allocations: 0 bytes)
102334155

(after trying again, I get ~272 ms here, so there are fluctuations)

Thus these micro-benchmarks serve only to show that languages (or tools within the languages) can achieve performances of the order of the best possible ones. But they don’t really say anything about how useful will be the language, or the tool. That remains to be seen for Mojo, being the compatibility with python something that makes it useful to start, but I guess for now nobody really knows how it will be more useful than these python compilers like numba for everyday use, when integration with the python ecosystem starts to be tested.

As a side note: this type of comparison is sort of a Julia curse, and will be a Mojo curse as well. Julia users (myself included) can easily get snipped in testing that Julia can be fast in any benchmark that shows the opposite. And that comes from the fact that one central feature of Julia is its claim to be fast. That was more common a couple of years ago, as for now I think it is already accepted that Julia can be fast. Now people, like me, got bored of testing these things. Mojo will probably go through the same path, and while people are convinced that the language can be fast is that the other characteristics of the language (programing paradigms, ecosystem, etc) will prove the language interesting and useful or not for each application.

16 Likes

I am saying that very tiny benchmarks (~ 10 LOC) have been optimized within an inch of their life in all modern, performance-oriented languages. They pretty much send the same code to the CPU, with minor differences at most. There is no magic here, and if you want to demo the performance of a new language, this is not the way to do it (and it is not the purpose of the Julia page you linked anyway).

In particular, for the Fibonacci sequence, the key is the choice of the algorithm (recurse to a tree and you are dead, or do it smartly and then you just add a few numbers).

Comparing performance-oriented languages starts with medium-sized codebases, at least 5k LOC, but that is a bare minimum, 50k LOC (in Julia) is much more informative. Of course, this is very time-consuming and hard to make “objective”. But since nanobenchmarks are so weakly informative this is the only way.

In other words, to be in a position evaluate Mojo even superficially, one has to program in it for a few months at least.

Before that happens, I think that making performance claims is likely to backfire in the medium run. This happened to Julia to a certain extent: yes, Julia is very performant, but that requires using it the right way. People just tried it without reading the manual, and in the best case they ended up opening a “Julia is slower than …” topic here and got the issue fixed. But I am sure that many of them just decided it was all hype and ignored Julia for a while before trying again.

14 Likes

But the point of the Fibonacci benchmark isn’t to measure how fast one can calculate the Fibonacci numbers, but to see if recursion is fast. The choice of a slow algorithm is intentional.

10 Likes

Certainly, but again, all modern, performance oriented languages solve this. Exceptions are treated as bugs and regressions, and otherwise they deliver pretty much the same performance, so this helps little for language comparisons. No “magic” is needed at this point to achieve competitive nanobenchmark results.

My point is that nanobenchmarks like this are not informative when comparing languages in the league of C, Rust, Julia, etc.

1 Like

You are right that the algorithm is important. The function

julia> fib(n)=last([1 1;1 0]^(n-1)*[1,1])

computes fib(90) in 500ns (larger values lead to an integer overflow). A better example of recursion is the Ackerman function, since it cannot be trivialized like that.

8 Likes

Does anyone has a view on why my benchmark are so different between 1.9 an 1.10? Is it something to worry about and that needs to be raised as an issue/regression on Julia github repo?

You write in a functional style (which is good). There are only two (one? zero?) inherent (i.e. after you’ve followed Julia’s performance section of the manual) performance limitations I know of in Julia. It’s not really hard to make a language fast, just hard to make it fast and dynamic, what Julia did, so far only language. Mojo feels like two languages in one, the new static, plus the old legacy dynamic Python syntax/semantics.

C++ (not just its standard library API) and Rust have performance issues too.

From Mojo’s docs:

Destroying values at end-of-scope in C++ is problematic for some common patterns like tail recursion because the destructor calls happen after the tail call. This can be a significant performance and memory problem for certain functional programming patterns.

Julia’s first (potential) speed limitation is the garbage collector. It can also make code faster (and the code is always simpler than C++; or Mojo likely). You can avoid the GC, or completely rewrite around it (e.g. with Bumper.jl and/or StaticTools.jl), so you can have that benefit of C++ and Mojo. You do not have a borrow checker like Rust and Mojo, that’s more of a concurrency safety (to avoid race conditions) issue. I’m not sure if it really helps for speed. I think Julia can match C++ and Rust anyway, though might require non-idiomatic code. I’m not sure, but I think Julia could add a borrow checker, though it would be less for speed.

If you write in imperative way then that’s the only potential issue, but for functional code, where tail-call optimisation (e.g. it doesn’t fully apply to your fib) might apply for other languages (and I recall Mojo has TCO), it will not in Julia, and Julia code will be a bit slower. That can always be fixed by rewriting as a loop; or using TCO macro from a package that has it. Note, it’s not worse in general to write (Julia) in a functional style, it can in some cases be much faster (in any language, for cache behaviour).

Strictly speaking, since Julia is bounds-checked by default, it’s the third handicap, but you can disable it selectively (and globally, but there’s talk of disallowing that option, since it’s also for some strange reason sometimes slower…).

Also Julia initializes variables (and structs) by default, a tiny O(1) overhead, but with similar and undef you can can turn it off, so I do not consider this a fourth overhead.

Under “Rust specific” at 19:52:

unwinding panics are Rust’s billion dollar mistake

I don’t know if Mojo shares that mistake (or what other languages, Go? I think not Julia). The talk is excellent also for other reasons, its main topic, 6x plus faster sort, or 3x for randomized.

That’s misleading. It’s parallel speedup for an embarrassingly parallel program. For the scalar version of Mojo, would be very much slower, with (such) parallel you can get almost arbitrarily faster (also in Julia), just limited by core count (and ok bandwidth, though likely rather CPU limited). The “scalar C++” version was only 5000x faster than Python, so I assume Mojo had at least 7 cores.

Reading from the benchmark there Julia’s mandelbrot (“userfunc_mandelbrot”) is also 2 orders of magnitude faster, like C++, but it’s hard to read from the (logarithmic) plot, likely only 1000x faster (that benchmark might be outdated since for Julia 1.0). I’m sure Julia can match C++ or Mojo on scalar, and Mojo on parallel.

6 Likes