So there are several reasons why Python is slower than Julia. First let me preface this by saying I have only superficial knowledge of how compilers work, so I might get something wrong here.
First Julia is compiled to machine code (or rather LLVM, which is then compiled to machine code), whereas Python is compiled to the higher-level bytecode. At runtime, the machine code is executed directly on the machine (which is efficient), whereas the byte code is executed by a “virtual machine”, the Python interpreter. Since the virtual machine abstracts over the hardware, a lot of overhead is large.
It’s possible to compile Python to machine code. One solution is to use Cython (which creates a static binary similar to C), another is Numba (which does just-in-time compilation similar to Julia). Neither will produce fast code, so clearly, there is more to the story than just compilation.
Another important factor is the representation of data types. Julia has nominal types, and uses the same representation of C. Basically this means that the binary representation of e.g. a UnitRange{Float64}
is simply 128 raw bits. In contrast, Python’s object are more complicated and include a header (with a pointer to the type and reference counts).
It gets worse, though. All Python objects are heap-allocated (I think, not 100% sure). Python can’t allocate on the stack, because it can’t compile down to a low enough level to manage the stack efficiently. And every object needs its header, otherwise Python can’t figure out what type an object is. In contrast, Julia can “inline” objects, because the type of these objects can be known at compile time, so at runtime, the value can just be raw bits. Python’s Numpy does something similar by having a dtype
.
And it gets worse, still. Python objects support adding arbitrary fields. This is implemented by each object having a dict
inside it. So allocating your custom integer type also allocates a dict
. Ouch.
There are more issues with Python’s implementation details, explained in this video: How Python was Shaped by leaky Internals, Armin Ronacher, Flask framework - YouTube, but I’m not into these details.
The consequence is that even compiled Python code is still much, much slower than Julia.
Edit: I should mention that both Cython and Numba allows you to use non-Python datatypes. For Cython, you can define static types in C-style, whereas Numba can “auto-translate” a small subset Python types into equivalent C-like types (mostly simple types like numbers and lists of numbers). When this is done, Cyhon/Numba achieves the same speed as Julia. Conversely, you can write completely type-unstable Julia code where the compiler won’t help you at all, and you’ll see Python-like speed. So it really comes down to compilation + efficiently represented types.