Why `Mojo` can be so fast?

You write in a functional style (which is good). There are only two (one? zero?) inherent (i.e. after you’ve followed Julia’s performance section of the manual) performance limitations I know of in Julia. It’s not really hard to make a language fast, just hard to make it fast and dynamic, what Julia did, so far only language. Mojo feels like two languages in one, the new static, plus the old legacy dynamic Python syntax/semantics.

C++ (not just its standard library API) and Rust have performance issues too.

From Mojo’s docs:

Destroying values at end-of-scope in C++ is problematic for some common patterns like tail recursion because the destructor calls happen after the tail call. This can be a significant performance and memory problem for certain functional programming patterns.

Julia’s first (potential) speed limitation is the garbage collector. It can also make code faster (and the code is always simpler than C++; or Mojo likely). You can avoid the GC, or completely rewrite around it (e.g. with Bumper.jl and/or StaticTools.jl), so you can have that benefit of C++ and Mojo. You do not have a borrow checker like Rust and Mojo, that’s more of a concurrency safety (to avoid race conditions) issue. I’m not sure if it really helps for speed. I think Julia can match C++ and Rust anyway, though might require non-idiomatic code. I’m not sure, but I think Julia could add a borrow checker, though it would be less for speed.

If you write in imperative way then that’s the only potential issue, but for functional code, where tail-call optimisation (e.g. it doesn’t fully apply to your fib) might apply for other languages (and I recall Mojo has TCO), it will not in Julia, and Julia code will be a bit slower. That can always be fixed by rewriting as a loop; or using TCO macro from a package that has it. Note, it’s not worse in general to write (Julia) in a functional style, it can in some cases be much faster (in any language, for cache behaviour).

Strictly speaking, since Julia is bounds-checked by default, it’s the third handicap, but you can disable it selectively (and globally, but there’s talk of disallowing that option, since it’s also for some strange reason sometimes slower…).

Also Julia initializes variables (and structs) by default, a tiny O(1) overhead, but with similar and undef you can can turn it off, so I do not consider this a fourth overhead.

Under “Rust specific” at 19:52:

unwinding panics are Rust’s billion dollar mistake

I don’t know if Mojo shares that mistake (or what other languages, Go? I think not Julia). The talk is excellent also for other reasons, its main topic, 6x plus faster sort, or 3x for randomized.

That’s misleading. It’s parallel speedup for an embarrassingly parallel program. For the scalar version of Mojo, would be very much slower, with (such) parallel you can get almost arbitrarily faster (also in Julia), just limited by core count (and ok bandwidth, though likely rather CPU limited). The “scalar C++” version was only 5000x faster than Python, so I assume Mojo had at least 7 cores.

Reading from the benchmark there Julia’s mandelbrot (“userfunc_mandelbrot”) is also 2 orders of magnitude faster, like C++, but it’s hard to read from the (logarithmic) plot, likely only 1000x faster (that benchmark might be outdated since for Julia 1.0). I’m sure Julia can match C++ or Mojo on scalar, and Mojo on parallel.

6 Likes