What is the difference between CodeInfo and Core.Compiler.IRCode?

Dear All,

I wanted to update my lecture notes about IR modifications. Two years ago, when I have written them, I wanted to have some introductory material into mechanisms Zygote relies on. Now, I have the feeling that its time to expand the notes with CompilerPlugins, since this seems to be the future.

I have therefore started to look around which provokes some questions.

My first question is, what is the difference between Core.CodeInfo and Core.Compiler.IRCode?

Second question. It used to be the case that generated function could return CodeInfo instead of julia code. Can generated function return IRCode, or there is a different mechanism how to hook to the compiler (I think yes there is and I expect Diffractor might contain an example).

Third question. IRTools.jl are great for modifying CodeInfo. I have found two tools for modifying IRCode CompilerPluginTools and CodeInfoTools.jl. I wonder if they have been abandoned because they were both dead ends or because of lack of human labor. I am also aware of TKF’s CodeInfo.jl, but I do not think it is designed for manipulation.

I do not have at the moment big aspirations, but I believe making something like Petite Zygote with IRCode might be a useful educational material.

6 Likes

Once upon a time Julia only had CodeInfo, which is produced by lowering and then updated/modified by abstract interpretation (nee inference). Then codegen takes it and translates it to LLVM IR.

When the iteration protocol got changed it became clear that we needed a layer upon which we can do optimizations a bit easier. I think everything in base/compiler/ssair operates over IRCode.

The translation routines are here: https://github.com/JuliaLang/julia/blob/master/base/compiler/ssair/legacy.jl

Besides some internal data-structure differences IRCode uses block indices for jump targets and PhiNodes over the statement indices in CodeInfo.

This makes working with the control-flow-graph a bit easier.

Now codegen still uses the old format so currently we translate from CodeInfo → IRCode, perform optimizations (sroa, inlining, simplification, and a few more a being worked on), and then translate back into a hybrid CodeInfo form to do codegen.

It used to be the case that generated function could return CodeInfo instead of julia code.

You can still do this. Note that @generated functions have the requirement of returning un-inferred IR.

there is a different mechanism how to hook to the compiler

If you have an inferred CodeInfo or IRCode you can use Core.OpaqueClosure to construct a call-able object.

IRTools has the issue that it goes to yet another different internal IR. This is why we never used it in Cassette.jl but rather worked with the tools provided by Core.Compiler directly.

A good example recently is https://github.com/JuliaLang/julia/pull/51120 which adds a global-value-numbering pass to Julia or my own re-invigorated experiments to write a loop-invariant-code-motion pass for Julia (GitHub - vchuravy/Loops.jl & Manual LICM in Julia)

It would be interesting to move ShowCode.jl to an organization and make it work over the core base datastructures.

While I understand (and do it myself often enough) for tools to work with Julia IR to live in external repositories, I would encourage contributions to Julia Core.Compiler so that we can improve the common tooling and have less risk of things bit-rotting.

12 Likes

More the latter than the former, I think. But certainly the way Core.Compiler is structured and developed does not make these kinds of libraries easy to maintain. @Roger-luo probably has an opinion on this.

Would it be feasible to define a somewhat stable subset of the Core.Compiler API and create a Compat.jl style polyfill library which allows users to target it across different Julia versions? At present, I think having to ride the nightly train to do any kind of non-trivial IRCode manipulation limits the appetite for doing compiler stuff outside of JuliaLang/julia (and by extension the number of people who would be comfortable helping with Julia-side compiler stuff in Core). IMO it’s also one reason IRTools still exists at all: despite its many flaws, one gets a stable and version-agnostic interface to work with.

3 Likes

Yeah, I later decided not to depend on Julia internals but just write my own IR, so that I don’t need that package anymore, which is why I’m not updating it. Keeping it up to date with internals is also a lot of work for me.

2 Likes

Big thanks for clarification to all. This is really useful. I will keep poking around.

One more question.

Is there a function which will reassign SSAValue? I have modified IRCode (again through a silly example) and obtain something like this

2 1 ─ %1 = (_2 * _3)::Float64                                                                                     │
3 │  %5 =  Main.cos(_3)::Float64                                                                                  │
  │   %2 = Main.sin(_2)::Float64
  │   %3 = (%1 + %5)::Float64                                                                                     │
  │   %6 = (%4 + %2)::Float64                                                                                     │
  │   %4 = return %6

which has the problem that SSA are not ordered. The IRCode now contains two elements in ir.new_nodes. I would like to have a function, which now “retires” the new_nodes to stmts and assign correct numbers. Do I have to write it by myself, or it is readily available?

Thanks a lot in advance for help.
Best wishes,
Tomas

You want to run compact! over the IR.

Thanks for the swift answer. I have tried that and i have not seen the effect. I will try again tonight and post mwe in case of success or failure.

Below is an MWE that tries to modify the IRCode of a function and execute it. The goal is to change the function foo to fooled.
By manipulating IRCode.

import Core.Compiler as CC
using Core: SSAValue, GlobalRef, ReturnNode

function foo(x,y) 
  z = x * y 
  z + sin(x)
end

function fooled(x,y) 
  z = x * y 
  z + sin(x) + cos(y)
end

(ir, rt) = only(Base.code_ircode(foo, (Float64, Float64), optimize_until = "compact 1"));
nr = CC.insert_node!(ir, 2, CC.NewInstruction(Expr(:call, Core.GlobalRef(Main, :cos), Core.Argument(3)), Float64))
nr2 = CC.insert_node!(ir, 4,  CC.NewInstruction(Expr(:call, GlobalRef(Main, :+), SSAValue(3), nr), Float64))
CC.setindex!(ir.stmts[4],  ReturnNode(nr2), :inst)
ir = CC.compact!(ir)
irfooled = Core.OpaqueClosure(ir)
irfooled(1.0, 2.0) == fooled(1.0, 2.0)

So what we did?

  1. (ir, rt) = only(Base.code_ircode(foo, (Float64, Float64), optimize_until = "compact 1")) obtain the IRCode of the function foo when called with both arguments being Float64. rt contains the return type of the
  2. A new instruction cos is inserted to the ir by Core.Compiler.insert_node!, which takes as an argument an IRCode, position (2 in our case), and new instruction. The new instruction is created by NewInstruction accepting as an input expression Expr and a return type. Here, we force it to be Float64, but ideally it should be inferred. (This would be the next stage). Or, may-be, we can run it through type inference? . The new instruction is added to the ir.new_nodes instruction stream and obtain a new SSAValue returned in nr, which can be then used further.
  3. We add one more instruction + that uses output of the instruction we add in step 2, nr and SSAValue from statement 3 of the original IR (at this moment, the IR is still numbered with respect to the old IR, the renumbering will happen later.) The output of this second instruction is returned in nr2.
  4. Then, we rewrite the return statement to return nr2 instead of SSAValue(3).
  5. ir = CC.compact!(ir) is superimportant since it moves the newly added statements from ir.new_stmts to ir.stmts and importantly renumbers SSAValues. Even though the function is mutating, the mutation here is meant that the argument is changed, but the new correct IRCode is returned and therefore has to be reassigned.
  6. The function is created through OpaqueClosure.
  7. The last line certifies that the function do what it should do.

I have mark this as an answer, but of course many interesting question remains unanswered. For a person without a proper compiler class (but who on highschool did some x86 assembly), this is very interesting avenue.

4 Likes