What materials should I read extending the compiler?

So, off the bat, I rather doubt that messing with the compiler internals is a good fit for what you’re describing here, e.g. your example of

is much more easily avoided by using union splitting, or using something like SumTypes.jl to enacpsulate the union and automate the splitting. Julia is a very powerful and flexible language, and most goals can be accomplished without compiler modifications. This is a bit like trying to learn to do a backflip while still learning to walk.

With that said, let me link to a few resources on the topic in case this ends up being useful to people whose use-cases do involve working with compiler modifications.

The most user-friendly[1] thing here is to use @generated functions which are a powerful way you can specialize code generation based on the input types of a function. Generated functions also let you return a CodeInfo object instead of an Expr and this has super far reaching implications. The section on Cassette of this blogpost: The Emergent Features of JuliaLang: Part I · Invenia Blog has a great explanation.

That leads naturally to the tools built on generated functions, namely Cassette.jl and IRTools.jl. Both of those have pretty good documentation pages I’d recommend reading. These tools are now kinda old-fashioned and frozen in time. They give some cool capabilities but have some fundamental limitations people are somewhat unhappy with.

Next we have the new generation of compiler plugins via the abstract interpreter mechanisms. This is the bleeding edge of compiler plugins, and there’s exciting stuff happening here, but it’s also constantly changing, and very unstable. Here be dragons. There are two big, stable-ish, and modern packages that take advantage of this mechanism for different purposes: Enzyme.jl and JET.jl. Most other newer or more experimental packages that use the abstract interpreter are developed by copy-pasting chunks of code from Enzyme.jl’s compiler passes and tweaking it until it works for their purposes. For example, StaticCompiler.jl and AllocCheck.jl were developed via liberal use of reverse engineered Enzyme code.

For these compiler plugings, there’s not really any pedagogical material, just existing code bases you can try to understand. One nice exception though would be a series of discourse posts by @Tomas_Pevny. I’d strongly recommend checking out this thread: Materials about AbstractInterpreter and this thread: Manual type inference of hand-written IRCode

This information has a very short half-life though and will probably all be out of date in a matter of months, so there’s not going to be any useful pedagogical materials to learn from until it stabilizes.



  1. note: I’m using “user friendly” in relative terms. Generated functions should probably be avoided if there’s a simpler way of accomplishing the same goal ↩︎

13 Likes