Compiler tools and workflow for compiler optimization

Hi, I’m trying to teach myself how to work with the native Julia compiler, and was trying to get the basics of how would you set up a workflow to try to write a compiler optimization pass.

So the tools I found out there are:

I guess a good entry point to start testing things out is Base.code_ircode. So for example if I try something like this:

ir = Base.code_ircode( sum, Tuple{Vector{Float64}} )[1]

that gives me the IR after inference and optimizations, but before any LLVM optimizations.

IRViz is very cool next step then.

I have two questions:

  1. Which tools should I use to modify the IR?

I’m pretty lost here, as IRTools sounds like the right package, but it is from FluxML, and it also says that the package “provides an IR”. I would like to modify the native Julia IR.

Is then CodeInfoTools the right option then?

  1. How do I plug a modified IR back into to compilation pipeline?

Now, that would be nice to test the results. I’m a little bit more lost here. I guess this is normally handled by Julia’s C code, so I tought maybe the way would be to try to ccaall jl_create_native and jl_dump_native? Is anybody aware of tools or packages that maybe do this for me? Is it CompilerPluginTools, perhaps? I found that it has no docs.

Thanks!

5 Likes

Congrats on taking the plunge! The good news is that many people have gone through the same journey. The bad news is that most of us are stuck at roughly the same place you are :sweat_smile:. I know you replied on the thread, but there’s a reason Why are compiler devs in such high demand for Julia? - #10 by jpsamaroo is so important!

Until then, the answer depends on what you want to do. Working with untyped IR (specifically CodeInfo) is better documented. IRTools lets you go from that to a custom IR which is arguably easier to learn, but I wouldn’t recommend it for anything other than learning for various reasons. Here I’ll also plug @Tomas_Pevny’s great course materials covering both. They’re a rare bit of polished documentation on the subject. Other ways to glean information include reading through the source/docs of libraries such as Cassette.jl.

Working with typed IR (CodeInfo or IRCode) is a lot tougher. The current approach employed is to use AbstractInterpreter’s + opaque closures to capture function IR and feed transformed IR back into the compiler respectively. A few libraries have tried this approach, but because the APIs are so new they have either bitrotted or are not a great intro to the topic. One set of keywords which may help your search is “compiler plugins”. These were proposed a couple years back as a more user-friendly and stable interface for working with the compiler, so some of the design docs and draft PRs. Unfortunately the work stalled a little while back so I would treat these more as documentation than ready-made solutions for you.

10 Likes

Yes, this is a kind of a follow-up from that thread. I pretty much relate to what @mrufsvold said there.

So thanks for sharing the state of the art!

The compiler plugin project was exactly was I was hoping for… The design docs really summarize the situation (at least of two years ago) and a good plan looking forward.

I’ll take a look at opaque closures and maybe at how Diffractor.jl uses them.

As for AbstractInterpreter, what’s the story? I think Base.code_ircode already gives me the optimized IR, and is internally calling some interpreter.

I guess at some point I will stop looking for tools and just try to hack at whatever I want to do.

I’m interested in performance optimization. For example, I’m intrigued by how better escape analysis (and shape inference) could lead to more stack allocations of otherwise GC-ed variables.

I would also like to contribute to static compilation, and I found StaticCompiler.jl to be too much of a hack, as it uses the GPU Compiler pipeline, and I don’t understand why you can’t just use the native pipeline. I guess it would be better supported in the long run in that case. In that regard, I found Compiler3 static.jl very interesting. It ccalls jl_create_native to generate an .o file of a single function.

In any case, if I manage to do any of this, I’ll try to make sure to document properly.

Just a nitpick here, Mixtape.jl didn’t actually feed the transformed IR back into the compiler, it instead fed it into GPUCompiler.jl, forking out of the compilation pipeline.

There’s actually a more or less working implementation of Mixtape in StaticCompiler.jl https://github.com/tshort/StaticCompiler.jl/blob/master/src/mixtape.jl by the way.

3 Likes

Yes, that was me being lazy and not wanting to separate out the clauses properly :sweat_smile:. I opted to link the original MixTape because it’s better documented and having the GPUCompiler part seemed less relevant given @martin.d.maas wants to pass IR back into the default compilation pipeline. That said, you reminded me that StaticCompiler is a pretty good example of how to use a custom AbstractInterpreter and interact with other compiler internals!

It allows one to intercept, analyze and possibly override function behaviour in call chains. You can see an example of how this might be applied for escape analysis in the compiler tests.

1 Like

To learn the tools available I would look at the compiler passes that julia does, like sroa or dce. And see how they deal with the IR and the CFG.

While not a bad idea in theory, I think that runs headlong into points 1 and 2 on @jpsamaroo’s list. This is based on personal experience reading through some of the passes, perhaps others will have a better time of it.

2 Likes

Yeah, trying to understand the passes within Julia’s compiler is not trivial even for an experienced Julia developer, because all sorts of algorithms are intertwined with manual performance optimizations and support for all kinds of weird (or even illegal) IR patterns. The lack of an equivalently sophisticated level of documentation for this code makes it quite inscrutable to anyone except the original author.

6 Likes

I would also like to contribute to static compilation, and I found StaticCompiler.jl to be too much of a hack, as it uses the GPU Compiler pipeline, and I don’t understand why you can’t just use the native pipeline.

One thing to note is that GPUCompiler is for the most part using the native pipeline. These days it’s a misnomer and there has been some debate around it, but GPUCompiler.jl also “just” calls jl_create_native, while providing infrastructure to modify the native pipeline for Cross-Compilation.

Both Enzyme.jl and StaticCompiler.jl are examples of using GPUCompiler for native/“host” compilation. The former assumes execution in the same session and the latter intentionally trying to avoid that.

One fundamentally challenge has been the question of “how do I feed this back to the Julia compiler, after having done arbitrary transformations on it” and in some ways the stance has been “you don’t” except under limited circumstances.

The existing mechanisms are:

  1. @generated functions accepting un-inferred IR (Cassette.jl style)
  2. llvmcall injecting arbitrary “unoptimized IR”
  3. Using ccall to call a function pointer generated through either a shared library (StaticCompiler) or a side-loaded JIT compiler (Enzyme)
  4. OpaqueClosures (I haven’t played with them yet to determine what I could do with them, a challenge I have encountered is dynamic function calls out of the OC escaping the abstract interpreter)

@gbaraldi has been exposing with a different way to do Enzyme without side-loading a JIT, but that is mostly to improve things like backtraces.

8 Likes

Ahh, thanks for making this clear… So GPUCompiler has a native (CPU) sort of backend as well.

I was going trough GPUCompiler and StaticCompiler and understood the following:

jl_create_native gets called by compile_method_instance (in jlgen.jl).

Now, what is a “method instance”? Judging by method_instance_generator, a method instance seems to be a CodeInfo object with certain “world age” parameters set to deal with method invalidations.

As for StaticCompiler, the very first thing generate_obj_for_compile does, is create CompilerConfig and CompilerJob, something that looks very similar to creating a native_job from the test suite of GPUCompiler.

So… maybe a minor modification of generate_obj_for_compile could be made to compile a modified CodeInfo (stored within CompilerJob), and with the proper parameters and cofiguration, do link to Julia’s runtime and generate a function that is available within a Julia session.

Now, if this works, that could provide a nice interactive workflow to develop compiler optimization passes. One could easily run different passes and perform benchmarks with and without them.

The rest would be actually the toughest part, which is to manipulate Julia’s IR, load things like escape info, and actually optimize code at that level…

A methodinstance is simply a method applied to types. For a specific example:
+ is a function.
@which +(1,1) returns the Method: +(x::T, y::T) where T<:Union{Int128, Int16, Int32, Int64, Int8, UInt128, UInt16, UInt32, UInt64, UInt8}. A Method corresponds to the code that is written.
A method instance is the result of applying types to a method method instances are what is associated with typed Julia IR and native code.

1 Like

Depending on the code changes you want to do, CassetteOverlay.jl may be of interest. It looks like a version of Cassette made using an AbstractInterpreter.

5 Likes

CassetteOverlay uses the Cassette transform in lieu of an AbstractInterpreter, hence the name. That said, it’s a relatively legible and self-contained example of how to write an IR-based SCT and how to use overlay method tables.

1 Like

Yeah. Probably GitHub - vchuravy/KernelCompiler.jl: Experimental compiler infrastructure for KernelAbstractions is the simplest example for that (albeit a bit outdated) since I never got around to add the custom “complicated” stuff.
Enzyme.jl uses the same notion.

I don’t think I solved the generic dispatch issue there , e.g. on a dynamic dispatch do you stay in the “changed pipeline” or do you go back to Julia proper.

The benefit with that approach is that you control the entire pipeline, the downside is that you control the entire pipeline…

The other idea that has been mentioned is to not take responsibility of the LLVM part and instead put the result into an OpaqueClosure (V and S scratched something · GitHub, cc: @aviatesk)

That still struggles from the generic dispatch issue, but the GPUCompiler verifier won’t yell at you. The way Cassette/CassetterOverlay fix this issue btw is by rewriting every call to be a call of the overdub function. Enzyme solves it on the LLVM level by rewriting jl_apply_generic

2 Likes

Oh :frowning:

Btw, I went trough Julia’s C code, and I found out that jl_create_native calls jl_ci_cache_lookup, which seems to be running type inference (calls jl_type_infer).

I had the vague idea that more of the Julia compiler pipeline was implemented in Julia itself, but it seems that I was wrong, and those Julia functions that run type inference, etc for a given code are probably just wrapping the C internals. As I wouldn’t like to modify the C code, this jl_create_native seems like a dead end to me.

I’ll take a look at all the other options that were mentioned.

A lot of the logic is in julia itself. See base/compiler/abstractinterpretation.jl.

1 Like

Hey! @Tomas_Pevny and me are thinking about updating our course materials which outline e.g. a “petite zygote” using the latest compiler plugin tools. are there any even newer resources we could take a look at before diving into the things in this discussion?

apart from what is listed here we are aware of @oxinabox gist Running the Julia Compiler pipeline manually · GitHub

Thanks a lot!

1 Like

I recently did some work touching on this.

https://gitgub.com/vchuravy/Loops.jl and Manual LICM in Julia

3 Likes