Julia IRs, and how to have Julia compile user-generated IR code

In a recently registered tiny package, I have a function whose body is basically just an llvmcall that calls the LLVM unreachable instruction: Core.Intrinsics.llvmcall("unreachable", Cvoid, Tuple{}). The goal is to convince the compiler optimizer that some piece of code is not reachable, and thus eliminate it as dead code. More context is here and here.

I’ve come to realize that it may be possible to implement the same functionality on a higher level, because some of Julia’s intermediate representation (IR) codes seem to have explicit support for something like an unreachable instruction:

  1. here in the docs, it seems that the “lowered form AST” representation considers a ReturnNode element to be unreachable in some conditions. This PR also seems relevant.

  2. Here in the docs, I see $(Expr(:unreachable))::Union{} in some IR display. I don’t understand much on that page, but the :unreachable part surely refers to something like an unreachable instruction.

The benefits of using a Julia IR instead of llvmcall would presumably be better effect inference (llvmcall seems to make Julia give up on effect inference) and better inlining heuristics. I’m not sure, however, how stable the IRs are across Julia releases, so I’d probably still have to use the llvmcall as a fallback?

So:

  1. What are all the Julia IRs? How stable is each one from the perspective of an end-user?

  2. Which ones among the IRs support unreachable instructions?

  3. How to use IR code as an end-user within Julia code? (The PR linked above has some relevant code in the test it adds, however I don’t feel like I understand everything well enough to be able to modify the example.)

LLVM.jl’s assume sets alwaysinline I think so you might be able to copy that. Don’t know about the rest.

1 Like

The compiler pipeline is roughly:

  1. Abstract syntax tree (AST): Macro level – produced by parsing
  2. Linearized untyped IR: Produced by lowering: @generated and Cassette like
  3. Typed IR: Produced by inferences (abstract interpretation based IPO + type inference)
  4. Julia SSAIR: High-level optimizer, used by Julia’s own optimization passes
  5. LLVM IR: Produced by codegen – llvmcall/LLVM.jl
  6. “Assembly”

The only gurantueed to be stable-ish is the AST level, but we may change/add AST nodes.

I don’t think we have an unreachable on the AST level. The closest is throw(nothing).

julia> @eval f() = $(Expr(:unreachable))
ERROR: syntax: invalid syntax (unreachable)
2 Likes

The function Base.uncompressed_ir returns a Core.CodeInfo value:

julia> Base.uncompressed_ir(which((() -> nothing), Tuple{}))
CodeInfo(
    @ REPL[1]:1 within `#1`
1 ─     return Main.nothing
)

julia> typeof(ans)
Core.CodeInfo

I assume Core.CodeInfo represents one of the IRs, either (2) or (3)? Do you know which one? I guess it’s (2), or maybe it’s shared for both (2) and (3)?

Furthermore, when I do something like this, I’m able to construct an IR function with an “unreachable” body, seemingly:

julia> function f()
         src = Base.uncompressed_ir(which((() -> nothing), Tuple{}))
         src.code = Any[Core.ReturnNode()]
         src
       end
f (generic function with 1 method)

julia> f()
CodeInfo(
    @ REPL[1]:2 within `#1`
1 ─     unreachable
)

However, I think the modified CodeInfo may be malformed somehow, because playing with it further causes Julia to throw up lines like

Internal error: encountered unexpected error in runtime:
UndefRefError()

unreachable (or ReturnNode()) is a block terminator which indicates that the instruction is unreachable but the instruction actually need to be provably unreachable (preceded by an always throwing instruction or in an unreachable block).

The PR Allow using `ReturnNode()` in `@generated` code by Pangoraw · Pull Request #51715 · JuliaLang/julia · GitHub allows unreachable to be inserted in untyped IR (for generated functions) in provably unreachable code. But it still needs to be preceded by a throwing instruction when the code is actually reachable (see Handle unreachable blocks in the adjoint CFG by Pangoraw · Pull Request #1465 · FluxML/Zygote.jl · GitHub for such a case).

I see it similarly as the LLVM unreachable which also needs to be preceded by no-return function call. But in Julia’s case it also prevent the unreachable block to appear in the list of predecessors for the following block due to implicit control flow (not sure if that happens in LLVM too).

1 Like

Thank you, so ReturnNode() isn’t useful for replacing LLVM’s unreachable instruction after all.

I don’t think that unreachable needs to be preceded by anything? The doc entry says (emphasis mine):

This instruction is used to inform the optimizer that a particular portion of the code is not reachable. This can be used to indicate that the code after a no-return function cannot be reached, and other facts.

So the no-return call is just given as an example, I suppose?

So the no-return call is just given as an example, I suppose?

You are right, in an empty block it will optimize the conditional branch as an @llvm.assume() which is what is used in your package.

1 Like