JIT Compiler for CPython

Some Python core developers are floating the idea of a JIT compiler for Python.

By the way, the speaker has created a PR and used poetry to describe it.

3 Likes

For those who prefer to read rather than watch a video, this article explains how the Python JIT compiler works.

Out of curiosity, what are the pros and cons of the Julia JIT compiler versus the Python copy-and-patch JIT compiler?

2 Likes

A short, non technical, answer could be that Julia has been designed after LLVM went out :slight_smile:

From the article:

The initial benchmarks show something of a 2-9% performance improvement.

The basic issue is that it’s hard to efficiently compile Python code because the language semantics weren’t designed for this. There have been many attempts to JIT-compile Python, and some of them have been very sophisticated with impressive results (PyPy, Numba, Pythran, Pyston, …), but generally they have worked well only for a subset of the language. But Python is so widely used that even small speedups (or big speedups on a small subset) are worth the effort, and I wish them well.

It’s for the same reason that you can’t just slap a “Julia backend” underneath Python and expect to see any improvements — a compiler by itself isn’t enough, and Julia’s compiler (which is just ordinary LLVM) isn’t what makes Julia special. This is also a Julia FAQ.

See also many previous discussions — How hard would it be to implement Numpy.jl, i.e. Numpy in Julia?Python to Julia transpilerConvert Matlab Code to Julia 1.0 — as well as this blog post by @ChrisRackauckas.

15 Likes

It appears that Python will have low startup time using the ideas of this paper.
image

I just wonder why the Julia compiler doesn’t use this kind of JIT compiler.

3 Likes

Do you happen to have a few million dollars lying around to spend on changing JIT compilers?

3 Likes

The paper isn’t about Python at all, but WebAssembly and a C++ -based DSL, the latter being the subject of that plot. Different language semantics would almost certainly affect the startup time, and the comparison of their own compiler to LLVM over their own DSL raises an unanswered question of whether their DSL introduces difficulties to compilation and optimization by LLVM. Since LLVM still makes faster code than their copy-and-patch compiler in their selected languages and Python’s copy-and-patch compiler so far made a negligible change to performance, there’s not a lot of justification for Julia to follow suit.

5 Likes

What would be the advantage of using something like this in Julia given that optimized Julia is already comparable to C? Would it help for unoptimized code?

As I understand it, the claim is not that they generate faster code, but that they generate code faster — that is, the compilation time is reduced, not the runtime of the resulting code. In fact, their resulting runtime is slightly worse, but they claim orders of magnitude improvement in compile time.

They do this by caching mostly compiled code for thousands of small snippets corresponding to fragments of ASTs (abstract syntax trees) that are frequently re-used:

At a high level, copy-and-patch works by having a pre-built library of composable and parametrizable binary code snippets that we call binary stencils. At runtime, optimization and code generation become the simple task of looking up a data table to select the appropriate stencil, and instantiate it to the desired position by copying it and patching in the missing values. […] The stencil library contains many stencil variants for each bytecode or AST node type that are specialized for different operand types, value locations, and more.

It’s not exactly clear to me what the granularity of their stencils are — they cache about 100,000 of them, but only give a handful of examples, like:
image

Julia’s code generator also caches code for thousands of small snippets, at the granularity of function calls specialized for different argument types. e.g. a == b is a function call in Julia with many different compiled variants, and the cached code is often inlined at the call site. However, the form in which Julia caches the code (typed SSA-form ASTs?) is much higher level than what the copy-and-patch algorithm uses, I think.

7 Likes