Julia IRs, and how to have Julia compile user-generated IR code

nsajko · October 17, 2023, 2:33pm

In a recently registered tiny package, I have a function whose body is basically just an llvmcall that calls the LLVM unreachable instruction: Core.Intrinsics.llvmcall("unreachable", Cvoid, Tuple{}). The goal is to convince the compiler optimizer that some piece of code is not reachable, and thus eliminate it as dead code. More context is here and here.

I’ve come to realize that it may be possible to implement the same functionality on a higher level, because some of Julia’s intermediate representation (IR) codes seem to have explicit support for something like an unreachable instruction:

here in the docs, it seems that the “lowered form AST” representation considers a ReturnNode element to be unreachable in some conditions. This PR also seems relevant.
Here in the docs, I see $(Expr(:unreachable))::Union{} in some IR display. I don’t understand much on that page, but the :unreachable part surely refers to something like an unreachable instruction.

The benefits of using a Julia IR instead of llvmcall would presumably be better effect inference (llvmcall seems to make Julia give up on effect inference) and better inlining heuristics. I’m not sure, however, how stable the IRs are across Julia releases, so I’d probably still have to use the llvmcall as a fallback?

So:

What are all the Julia IRs? How stable is each one from the perspective of an end-user?
Which ones among the IRs support unreachable instructions?
How to use IR code as an end-user within Julia code? (The PR linked above has some relevant code in the test it adds, however I don’t feel like I understand everything well enough to be able to modify the example.)

Zentrik · October 17, 2023, 3:00pm

LLVM.jl’s assume sets alwaysinline I think so you might be able to copy that. Don’t know about the rest.

vchuravy · October 17, 2023, 3:46pm

The compiler pipeline is roughly:

Abstract syntax tree (AST): Macro level – produced by parsing
Linearized untyped IR: Produced by lowering: @generated and Cassette like
Typed IR: Produced by inferences (abstract interpretation based IPO + type inference)
Julia SSAIR: High-level optimizer, used by Julia’s own optimization passes
LLVM IR: Produced by codegen – llvmcall/LLVM.jl
“Assembly”

The only gurantueed to be stable-ish is the AST level, but we may change/add AST nodes.

I don’t think we have an unreachable on the AST level. The closest is throw(nothing).

julia> @eval f() = $(Expr(:unreachable))
ERROR: syntax: invalid syntax (unreachable)

nsajko · October 17, 2023, 6:22pm

The function Base.uncompressed_ir returns a Core.CodeInfo value:

julia> Base.uncompressed_ir(which((() -> nothing), Tuple{}))
CodeInfo(
    @ REPL[1]:1 within `#1`
1 ─     return Main.nothing
)

julia> typeof(ans)
Core.CodeInfo

I assume Core.CodeInfo represents one of the IRs, either (2) or (3)? Do you know which one? I guess it’s (2), or maybe it’s shared for both (2) and (3)?

Furthermore, when I do something like this, I’m able to construct an IR function with an “unreachable” body, seemingly:

julia> function f()
         src = Base.uncompressed_ir(which((() -> nothing), Tuple{}))
         src.code = Any[Core.ReturnNode()]
         src
       end
f (generic function with 1 method)

julia> f()
CodeInfo(
    @ REPL[1]:2 within `#1`
1 ─     unreachable
)

However, I think the modified CodeInfo may be malformed somehow, because playing with it further causes Julia to throw up lines like

Internal error: encountered unexpected error in runtime:
UndefRefError()

Pangoraw · October 18, 2023, 8:00am

unreachable (or ReturnNode()) is a block terminator which indicates that the instruction is unreachable but the instruction actually need to be provably unreachable (preceded by an always throwing instruction or in an unreachable block).

The PR Allow using `ReturnNode()` in `@generated` code by Pangoraw · Pull Request #51715 · JuliaLang/julia · GitHub allows unreachable to be inserted in untyped IR (for generated functions) in provably unreachable code. But it still needs to be preceded by a throwing instruction when the code is actually reachable (see Handle unreachable blocks in the adjoint CFG by Pangoraw · Pull Request #1465 · FluxML/Zygote.jl · GitHub for such a case).

I see it similarly as the LLVM unreachable which also needs to be preceded by no-return function call. But in Julia’s case it also prevent the unreachable block to appear in the list of predecessors for the following block due to implicit control flow (not sure if that happens in LLVM too).

nsajko · October 18, 2023, 9:33am

Thank you, so ReturnNode() isn’t useful for replacing LLVM’s unreachable instruction after all.

I don’t think that unreachable needs to be preceded by anything? The doc entry says (emphasis mine):

This instruction is used to inform the optimizer that a particular portion of the code is not reachable. This can be used to indicate that the code after a no-return function cannot be reached, and other facts.

So the no-return call is just given as an example, I suppose?

Pangoraw · October 19, 2023, 12:45pm

So the no-return call is just given as an example, I suppose?

You are right, in an empty block it will optimize the conditional branch as an @llvm.assume() which is what is used in your package.

Topic		Replies	Views
"unreachable" reached (with infinite loops) vs Julia Internals & Design	7	515	May 7, 2024
Getting reusable LLVM IR representation of a module(or a script), modifying LLVM IR and replugging it in Julia Internals & Design question	5	2038	April 14, 2017
Why is usage of llvmcall so restricted? General Usage	12	457	June 7, 2024
Got `Unreachable reached` error while using IRTools General Usage	1	416	August 14, 2020
Assertions and assumptions Internals & Design	0	547	November 29, 2017

Julia IRs, and how to have Julia compile user-generated IR code

Related topics