JIT Compiler for CPython

xiaoxi · December 26, 2023, 12:33pm

Some Python core developers are floating the idea of a JIT compiler for Python.

By the way, the speaker has created a PR and used poetry to describe it.

github.com/python/cpython

GH-113464: A copy-and-patch JIT compiler

python:main ← brandtbucher:justin

opened 07:17AM - 25 Dec 23 UTC

brandtbucher

+1722 -15

'Twas the night before Christmas, when all through the code Not a core dev was …merging, not even Guido; The CI was spun on the PRs with care In hopes that green check-markings soon would be there; The buildbots were nestled all snug under desks, Even PPC64 AIX; Doc-writers, triage team, the Council of Steering, Had just stashed every change and stopped engineering, When in the "PRs" tab arose such a clatter, They opened GitHub to see what was the matter. Away to CPython they flew like a flash, Towards sounds of `PROT_EXEC` and `__builtin___clear_cache`. First [LLVM was downloaded, unzipped](https://github.com/brandtbucher/cpython/blob/justin/Tools/jit/README.md) Then the Actions were running [a strange new build script](https://github.com/brandtbucher/cpython/blob/justin/Tools/jit/build.py), When something appeared, they were stopped in their tracks, [`jit_stencils.h`](https://gist.github.com/brandtbucher/d6397b1d106c55164b32df27d53eb7b8), generated from [hacks](https://github.com/brandtbucher/cpython/blob/justin/Tools/jit/template.c), With their spines all a-shiver, they muttered "Oh, shit...", They knew in a moment it must be a JIT. More rapid than interpretation it came And it copied-and-patched every stencil by name: "Now, `_LOAD_FAST`! Now, `_STORE_FAST`! `_BINARY_OP_ADD_INT`! On, `_GUARD_DORV_VALUES_INST_ATTR_FROM_DICT`! To the top of the loop! And down into the call! Now cache away! Cache away! Cache away all!" But why now? And how so? They needed a hint, Thankfully, Brandt gave a great talk at the sprint; So over to [YouTube](https://youtu.be/HxSHIpEQRjs) the reviewers flew, They read [the white paper](https://dl.acm.org/doi/10.1145/3485513), and [the blog post](https://sillycross.github.io/2023/05/12/2023-05-12) too. And then, after watching, they saw its appeal Not writing the code themselves seemed so unreal. And the platform support was almost too easy, ARM64 Macs to 32-bit PCs. There was [some runtime C](https://github.com/brandtbucher/cpython/blob/justin/Python/jit.c), not too much, just enough, Basically a loader, relocating stuff; It ran every test, one by one passed them all, With not one runtime dependency to install. Mostly build-time Python! With strict static typing! For maintenance ease, and also nerd-sniping! Though dispatch was faster, the JIT wasn't wise, And the traces it used still should be optimized; The code it was JIT'ing still needed some thinning, With code models small, and some register pinning; Or [new calling conventions](https://discourse.llvm.org/t/rfc-exposing-ghccc-calling-convention-as-preserve-none-to-clang/74233), shared stubs for paths slow, Since this JIT was brand new, there was fruit hanging low. It was awkwardly large, parsed straight out of the ELFs, And they laughed when they saw it, in spite of themselves; A `configure` flag, and no merging this year, Soon gave them to know they had nothing to fear; It wasn't much faster, at least it could work, They knew that'd come later; no one was a jerk, But they were still smart, and determined, and skilled, They opened a shell, and configured the build; `--enable-experimental-jit`, then made it, And away the JIT flew as their "+1"s okay'ed it. But they heard it exclaim, as it traced out of sight, "Happy JIT-mas to all, and to all a good night!" * Issue: gh-113464

xiaoxi · January 9, 2024, 3:18pm

For those who prefer to read rather than watch a video, this article explains how the Python JIT compiler works.

Out of curiosity, what are the pros and cons of the Julia JIT compiler versus the Python copy-and-patch JIT compiler?

sylvaticus · January 9, 2024, 3:39pm

A short, non technical, answer could be that Julia has been designed after LLVM went out

stevengj · January 9, 2024, 4:10pm

From the article:

The initial benchmarks show something of a 2-9% performance improvement.

The basic issue is that it’s hard to efficiently compile Python code because the language semantics weren’t designed for this. There have been many attempts to JIT-compile Python, and some of them have been very sophisticated with impressive results (PyPy, Numba, Pythran, Pyston, …), but generally they have worked well only for a subset of the language. But Python is so widely used that even small speedups (or big speedups on a small subset) are worth the effort, and I wish them well.

It’s for the same reason that you can’t just slap a “Julia backend” underneath Python and expect to see any improvements — a compiler by itself isn’t enough, and Julia’s compiler (which is just ordinary LLVM) isn’t what makes Julia special. This is also a Julia FAQ.

See also many previous discussions — How hard would it be to implement Numpy.jl, i.e. Numpy in Julia? — Python to Julia transpiler — Convert Matlab Code to Julia 1.0 — as well as this blog post by @ChrisRackauckas.

xiaoxi · January 9, 2024, 4:54pm

It appears that Python will have low startup time using the ideas of this paper.

I just wonder why the Julia compiler doesn’t use this kind of JIT compiler.

StefanKarpinski · January 10, 2024, 3:25am

Do you happen to have a few million dollars lying around to spend on changing JIT compilers?

Benny · January 10, 2024, 5:22am

The paper isn’t about Python at all, but WebAssembly and a C++ -based DSL, the latter being the subject of that plot. Different language semantics would almost certainly affect the startup time, and the comparison of their own compiler to LLVM over their own DSL raises an unanswered question of whether their DSL introduces difficulties to compilation and optimization by LLVM. Since LLVM still makes faster code than their copy-and-patch compiler in their selected languages and Python’s copy-and-patch compiler so far made a negligible change to performance, there’s not a lot of justification for Julia to follow suit.

Wispy · January 11, 2024, 1:05pm

What would be the advantage of using something like this in Julia given that optimized Julia is already comparable to C? Would it help for unoptimized code?

stevengj · January 11, 2024, 1:25pm

As I understand it, the claim is not that they generate faster code, but that they generate code faster — that is, the compilation time is reduced, not the runtime of the resulting code. In fact, their resulting runtime is slightly worse, but they claim orders of magnitude improvement in compile time.

They do this by caching mostly compiled code for thousands of small snippets corresponding to fragments of ASTs (abstract syntax trees) that are frequently re-used:

At a high level, copy-and-patch works by having a pre-built library of composable and parametrizable binary code snippets that we call binary stencils. At runtime, optimization and code generation become the simple task of looking up a data table to select the appropriate stencil, and instantiate it to the desired position by copying it and patching in the missing values. […] The stencil library contains many stencil variants for each bytecode or AST node type that are specialized for different operand types, value locations, and more.

It’s not exactly clear to me what the granularity of their stencils are — they cache about 100,000 of them, but only give a handful of examples, like:

Julia’s code generator also caches code for thousands of small snippets, at the granularity of function calls specialized for different argument types. e.g. a == b is a function call in Julia with many different compiled variants, and the cached code is often inlined at the call site. However, the form in which Julia caches the code (typed SSA-form ASTs?) is much higher level than what the copy-and-patch algorithm uses, I think.

klwlevy · April 16, 2024, 9:45am

Now there is a PEP and some discussions about this here: PEP 744 – JIT Compilation | peps.python.org

Palli · April 16, 2024, 5:42pm

Welcome Wispy to Julia, the advantage would smaller compiled Julia programs, and, indirectly in some cases, faster Julia programs.

Note, Julia is already the fastest dynamic language, after static (classic) Fortran:

Until recently (i.e. with previous Julia version), Julia was compared to Java there at the Benchmark Game, but now to Fortran (and Chapel and C++), the current next fastest language(s).

We can fix that graph by working on the outlier(s), the top one (I think only one or two to beat Fortran); and the fixed startup-cost highlighed by the lowest outlier.

Julia isn’t really slower than C, Rust or C++. That’s an illusion. They compile ahead of time (e.g. with the slow LLVM), but the rules there only allow source code for Julia and thus Julia has the cost of compiling (JIT, and Julia’s JIT is slower than it needs to be) on the fly added to its runtime.

E.g. this one can be improved:

tp2750 · June 22, 2024, 8:07am

A recent report on the project: Adding a JIT compiler to CPython [LWN.net].

The linked blog-post is very interesting: Building a baseline JIT for Lua automatically |

tp2750 · June 22, 2024, 8:07am

I wish I had… It must be frustrating to see all these resources poured into python, which is not made for this kind of work, while julia clearly is.

fdekerme · July 24, 2025, 1:10am

For those interested, a nice article summarizing Bucher’s last talk about the current state of Python JIT:

Topic		Replies	Views
Reluctance to switch to Julia; PyPy and Cython General Usage	35	11694	December 10, 2019
Julia motivation: why weren't Numpy, Scipy, Numba, good enough? Community history	123	83251	September 21, 2018
Pyjulia in desperate need of attention form someone who knows what they're doing General Usage	22	5796	November 24, 2017
Comparing with Python General Usage	56	4756	May 30, 2018
Questions about Compiler and Compiling Modules New to Julia	21	2190	January 17, 2019

Related topics