Why Are Languages Like Python and MATLAB So Much Slower than Julia?

So I understand why Julia is fast, but why specifically are Python and MATLAB, generally speaking, orders of magnitude slower than Julia when doing similar operations?

To give a background on where I am at with my knowledge, I know that Julia has the JIT compilation mechanism which compiles to machine byte code to aid in speed. Python does not compile itself but I do not know how this comes to a speed penalty. I do not know what MATLAB does.

Could anyone explain the nuance here when it comes to how each of the languages handle programs users write?

Thank you!

~ tcp

2 Likes

Even though Julia is dynamic, it uses type inference to figure out as much static information about the program as it can. Using that, it can apply exactly the same type of optimizations as you would in a statically typed language. Julia also prevents some features (like local eval) which would make optimizations much harder.

Some useful links:


13 Likes

They aren’t! Don’t you read the forums? :wink:

5 Likes

Thanks @kristoffer.carlsson! I will check out these videos for sure.

Is this type interfacing structure what makes Julia perform (generally speaking) better than interpreted languages? Or is there more to it?

This FAQ is pretty much about this question:

Why don’t you compile Matlab/Python/R/… code to Julia?

3 Likes

The first video talks about it quite extensively :slight_smile:

3 Likes

So there are several reasons why Python is slower than Julia. First let me preface this by saying I have only superficial knowledge of how compilers work, so I might get something wrong here.

First Julia is compiled to machine code (or rather LLVM, which is then compiled to machine code), whereas Python is compiled to the higher-level bytecode. At runtime, the machine code is executed directly on the machine (which is efficient), whereas the byte code is executed by a “virtual machine”, the Python interpreter. Since the virtual machine abstracts over the hardware, a lot of overhead is large.

It’s possible to compile Python to machine code. One solution is to use Cython (which creates a static binary similar to C), another is Numba (which does just-in-time compilation similar to Julia). Neither will produce fast code, so clearly, there is more to the story than just compilation.

Another important factor is the representation of data types. Julia has nominal types, and uses the same representation of C. Basically this means that the binary representation of e.g. a UnitRange{Float64} is simply 128 raw bits. In contrast, Python’s object are more complicated and include a header (with a pointer to the type and reference counts).

It gets worse, though. All Python objects are heap-allocated (I think, not 100% sure). Python can’t allocate on the stack, because it can’t compile down to a low enough level to manage the stack efficiently. And every object needs its header, otherwise Python can’t figure out what type an object is. In contrast, Julia can “inline” objects, because the type of these objects can be known at compile time, so at runtime, the value can just be raw bits. Python’s Numpy does something similar by having a dtype.

And it gets worse, still. Python objects support adding arbitrary fields. This is implemented by each object having a dict inside it. So allocating your custom integer type also allocates a dict. Ouch.

There are more issues with Python’s implementation details, explained in this video: https://www.youtube.com/watch?v=qCGofLIzX6g, but I’m not into these details.

The consequence is that even compiled Python code is still much, much slower than Julia.

Edit: I should mention that both Cython and Numba allows you to use non-Python datatypes. For Cython, you can define static types in C-style, whereas Numba can “auto-translate” a small subset Python types into equivalent C-like types (mostly simple types like numbers and lists of numbers). When this is done, Cyhon/Numba achieves the same speed as Julia. Conversely, you can write completely type-unstable Julia code where the compiler won’t help you at all, and you’ll see Python-like speed. So it really comes down to compilation + efficiently represented types.

12 Likes

Matlab code is now fully jit-compiled: https://se.mathworks.com/products/matlab/matlab-execution-engine.html

But little is known about how the jit compiler works. Julia’s compiler is more of a just-ahead-of-time compiler, Matlab’s may or may not be more of a classic jit.

1 Like

The simplest answer is that, naively, higher level dynamic languages need to ask meta-questions about everything in order to figure out what to do. That is, given a function like:

function mysum(array)
    r = zero(eltype(array))
    for elt in array
        r += elt
    end
    return r
end

A traditional high-level language doesn’t know what elt will be — and it might not always be the same thing. So in every single iteration the language needs to ask:

  • what type is elt?
  • what type is r?
  • what method of + should I call to sum them together?

Those meta questions take real CPU operations and time to answer. It’s time that you’re not spent doing your algorithm and they get in the way of very powerful multiplicative performance gains from SIMD. Now the interesting thing is that since Julia is a dynamic language, you can actually get dynamic-language-like performance quite easily! In the example above, you just need to use Any[] arrays. Or @nospecialize macros. Or disregarding all the points in the performance tips chapter.

Traditional JITs can work around the problem by tracing the execution, noticing that elt is almost always a Float64, compiling a specialized version of that code segment, and swapping over to it… but you still need an escape hatch if something crazy changes (like if someone evals something into your local scope).

Julia’s JIT is more akin to a just-barely-ahead-of-time compiler, (almost) always specializing on the arguments it got to try to ensure that the generated machine code can avoid those meta-questions… and from the get-go Julia disabled some dynamic language features (like eval in local scope) to ensure that’ll be possible more frequently.

29 Likes