How to get Julia SSA-form IR (with Phi nodes)?

dfdx · April 23, 2022, 3:19pm

Given a function call, how can I get the corresponding SSA-form IR described here?

Alternatively, what is the easiest way - function, package or algorithm - to convert a normal lowered representation to the SSA form? I’m specifically interested in Phi nodes.

For the context, I’m trying to detect loops and conditions in lowered code. I have a prototype for loop detection using heuristics over goto statements, but the complexity grows quickly, so control flow analysis using well-known algorithms looks like a more robust approach.

gbaraldi · April 23, 2022, 3:40pm

Funnily enough I’m working on a similar thing. Adding a LoopInfo pass.
code_typed gives you the SSAIR, also check out https://github.com/JuliaLang/julia/pull/36832 which is what i’m currently rebasing.

anon56330260 · April 23, 2022, 4:15pm

Well, I think SSA-formed IR is produced by Julia’s type inferencer, so you can look through type inferencer’s code to get what you want. I believe it’s located in folder compiler/ssair and you can find the toplevel entry of conversion algorithm there. But this only produces untyped SSA and you still need the following abstract interpreter and optimizer to get the final typed IR. If you need typed IR you’d better directly invoke code_type.

dfdx · April 23, 2022, 11:25pm

It looks really interesting! Do you plan to publish it in a particular package or merge into julia itself? I think we discussed loop detection in a compiler pass here and there, so it might be a pretty valuable work for the community.

Regarding @code_typed, indeed, I see the phi nodes, but unfortunately it’s terribly slow compared to @code_lowered and does much more transformations than I’d like to see in the IR.

The untyped IR is exactly what I need. I will go through the code you linked, thanks!

I also found out that IRTools - now unmaintained, but still powerful package - re-implements many of relevant transformations in a more readable way by introducing an abstraction layer on top of the compiler primitives. I will now try to combine all the sources of knowledge to better understand how to better handle loops and conditions.

gbaraldi · April 24, 2022, 2:04am

I hope to make it a compile pass for julia. You could just stop the compiler pipeline in the middle, similar to how it’s done here. Inference · The Julia Language. Revise is nice for this because you can just comment out the compiler passes.

Roger-luo · April 24, 2022, 5:31am

FYI, I wrapped some tools for the development of YaoCompiler, so that you don’t have to touch a lot of internals, personally I think IRCode is much much easier to work with than typed code if you want to write code transformation in Julia: CompilerPluginTools.jl/ircode.jl at master · JuliaCompilerPlugins/CompilerPluginTools.jl · GitHub

dfdx · April 24, 2022, 10:59pm

I should have mentioned that I’m not writing a compiler pass I found out that surprisingly many things can be done in the “user space” by manually traversing CodeInfo (example), while debugging is a way simpler. Even more surprisingly, the performance of this approach is on par with the normal compilation-execution pipeline, at least in the current generation of the Julia compiler. But of course it’s just my current use case, so I’m really thankful for all the answers!

@Roger-luo Regarding IRCode, I played around with it a bit and can confirm it has much more useful details such as phi nodes, CFG, etc. But code_ircode() also optimizes out many details that I want to control. For instance, in this simple case:

foo(x) = 2x + 1
bar(x) = foo(x ^ 2)
baz(x, y) = bar(x) * foo(y)

code_lowered(baz, (Float64, Int)) returns:

1-element Vector{CodeInfo}:
 CodeInfo(
1 ─ %1 = Main.bar(x)
│   %2 = Main.foo(y)
│   %3 = %1 * %2
└──      return %3
)

while code_ircode(baz, (Float64, Int)) produces:

1-element Vector{Pair{IRCode, DataType}}:
1 1 ─ %1 = Base.mul_float(_2, _2)::Float64                    │╻╷╷   bar
  │   %2 = Base.mul_float(2.0, %1)::Float64                   ││╻╷    foo
  │   %3 = Base.add_float(%2, 1.0)::Float64                   │││╻     +
  │   %4 = Base.mul_int(2, _3)::Int64                         ││╻     *
  │   %5 = Base.add_int(%4, 1)::Int64                         ││╻     +
  │   %6 = Base.sitofp(Float64, %5)::Float64                  ││╻╷╷╷  promote
  │   %7 = Base.mul_float(%3, %6)::Float64                    ││╻     *
  └──      return %7                                          │     
   => Float64

At the very least, bar() and foo() have been inlined preventing me from treating them in some special way, e.g. replacing with a different call. Assuming I don’t want to dive into custom interpreters yet, is the a way to turn off optimization as we can do in code_typed()?

Roger-luo · April 24, 2022, 11:42pm

Yes you can replace the optimize pass with an empty one I think

Roger-luo · April 24, 2022, 11:55pm

The default Julia pass does the inlining etc. CompilerPluginTools.jl/passes.jl at master · JuliaCompilerPlugins/CompilerPluginTools.jl · GitHub

but you can also just replace it with an empty function CompilerPluginTools.jl/passes.jl at master · JuliaCompilerPlugins/CompilerPluginTools.jl · GitHub

the code_ircode function just use default Julia pass by default CompilerPluginTools.jl/ircode.jl at 0dce551c0c1786776f6df1923d8d9da666d5a90d · JuliaCompilerPlugins/CompilerPluginTools.jl · GitHub

you can replace it with no_passs for sure, this is quite useful when writing IR transforms since you need to check if the transform happens as expected

PS. I haven’t checked if things are broken on 1.8 and 1.9 btw, so just in case if you hit anything on 1.8 & 1.9 there might be things broken due to API changes

Roger-luo · April 25, 2022, 6:02am

It looks like you are implementing some kind of abstract interpretation tho? This code looks very similar to the IR interpret code to me…

dfdx · April 25, 2022, 10:49pm

Yes! The difference though is that I never delegate execution to the Julia compiler, but instead keep the full control. The task is to trace function call execution, recording onto a “tape” so-called primitives and recursing into non-primitives, and this is the n-th incarnation of the tracer. I used to do it by adding special instructions to the IR using Cassette, IRTools and more recently Mixtape. Yet it has always been like playing volleyball on a carousel - you throw the ball in one place and hope to catch it in another, but not really controlling the thing when it’s on the air. In the current approach, I just extract the CodeInfo and iterate over it, invoking function calls and following gotos. And this simple approach works surprisingly well!

Thanks for the detailed answers and links! I think my first approach will be to look into IRTools once again since I’m already familiar with its code base and it seems to implement most things I need (not just phi nodes). And if it doesn’t work, I will dive into the IRCode using CompilerPluginTools, which is pretty readable too!

ToucheSir · April 26, 2022, 1:14am

As nice as it would be to leave IRTools behind altogether, I’ve recently been looking into some optimizations + testing for it. This is mostly out of necessity since IRTools constitutes a large chunk of Zygote’s TTFG overhead, and hedging on us being unable to move off Zygote in the short-medium term is probably wise…

Roger-luo · April 26, 2022, 2:57am

Yeah I think it’s gonna be harder and harder to keep IRTools updated with Julia internals, we should just move on to IRCode which already has a lot of toolchains around inside Julia internals. I believe it’s more because the design is not stable yet and the lack of docstrings is preventing people from using it which is why I tried to create some higher-level wrappers around it in CompilerPluginTools

dfdx · April 26, 2022, 8:03am

But how stable is IRCode itself? I mean, CodeInfo is at least exposed via official @code_lowered, I thought IRCode is a very, very deep internal thing and can be changed significantly or dropped altogether at any time, no?

Roger-luo · April 26, 2022, 9:22am

I think the plan is to make IRCode stable eventually and build a tool chain around at least this is what was discussed last time about compiler plugins that I know, as now the core develop decided to support this officially. I’m not sure when would that happen tho, since I haven’t been working on this for a while. @aviatesk is the best person to ask for sure.

Xuan · April 27, 2022, 12:03am

In case this is helpful, I implemented some loop-detection utilities on top of IRTools while working on Genify.jl. The relevant code is here: https://github.com/probcomp/Genify.jl/blob/main/src/utils.jl

Note that this only detects natural loops, and works under the expectation that people aren’t using @goto statements in their code.

Would be cool if someone does something similar for IRCode as well!

ToucheSir · April 27, 2022, 4:49am

There’s also the built-in https://github.com/FluxML/IRTools.jl/blob/master/src/passes/relooper.jl, but I don’t know who actually uses it in the wild…

The timelines for this really are the biggest thing. If a stable, 1.6-compatible IRCode were available today, we could jettison IRTools completely from libraries like Zygote. As-is I would imagine that is unlikely to materialize

Roger-luo · May 8, 2022, 7:04am

I just realize you might be asking for custom intrinsic support instead of inlining in the context of AD, since I think it’s better to let DCE etc. run first before doing AD, this is because we get less and simpler code to optimize and transform for AD, currently, the way doing custom intrinsic is not very stable, but you can use a custom method table to achieve this (check what GPUCompiler does for GPUs via abstract interpreter), basically, I think for those methods with handwritten ADs (primals) they can be treated as an intrinsic in the transformation passes, then you extract a method table from CR, then implement your own absinp to produce an IRCode that treats these methods as intrinsics. Then synthesize a gradient IRCode from it.

I like your idea on linearized IR btw, I think this might lead to better-optimized and more stable AD in Julia.

dfdx · May 9, 2022, 10:04pm

Thanks for the tip! I think the AD case is actually simpler than that - treating a program as a single computation graph actually opens the door for many optimizations that are simply not possible in a general-purpose compiler. But of course there are other cases for tracing that go beyond AD, and I’m enjoy watching the progress of the compile plugins infrastructure.

Then synthesize a gradient IRCode from it.

Is it actually possible to compile IRCode back to a normal function? I think there was a discussion about it in Slack recently, and resolution was that conversion to IRCode is a one way ticket. Or I just didn’t read it to the end?

Roger-luo · May 11, 2022, 4:58pm

It is one-way, which means you can’t transform lowered IRCode back to CodeInfo, but you can transform IRCode to typed CodeInfo then go through Julia codegen, which will give you the executable.

I didn’t mean a general-purpose compiler, it’s just about reducing the general SSA to a single basic block so you get the simple computational graph. And it’s gonna be simpler if you get the representation after compiler optimization.

Topic		Replies	Views
What is the difference between CodeInfo and Core.Compiler.IRCode? Internals & Design	8	780	November 13, 2023
Compiler tools and workflow for compiler optimization Internals & Design	18	2248	November 3, 2023
Julia IRs, and how to have Julia compile user-generated IR code General Usage compilation , ir	6	778	October 19, 2023
How to run lowered AST? General Usage	10	952	April 10, 2018
Why does julia have slots, rather than go full SSA? Internals & Design	7	762	March 8, 2023

How to get Julia SSA-form IR (with Phi nodes)?

Related topics