Is it time to make LLVM optional and how?

LLVM is a very heavy dependency, and is required (for fully compiled apps), just (I meant e.g.) in case eval is used. I would like to drop LLVM requirement (for still rather fast code), and I know it’s possible, then you just (currently) risk a runtime error.

Or you can use “min” here (but it’s very slow at runtime since all your code in interpreted):

 --compile={yes*|no|all|min}
                          Enable or disable JIT compiler, or request exhaustive or minimal compilation

When I say all your code, I think I mean e.g. your script file, but precompiled package code would still be fully compiled to native code (unless recompilation needed I guess).

Can someone clarify what is being interpreted, is it the source code, i.e. in each iteration of a loop, or is it actually the LLVM bitcode (for such loops), then still needing LLVM, and at least naive non-optimized LLVM compilation?

LLVM is claimed to be very slow, even when you ask it to do not much, no optimization, see link below.

Many languages/compilers do not use LLVM, and some e.g. Roc allows both, it has a WebAssembly backend (best supported), plus native x86 assembly, and native ARM, PLUS LLVM backend as 5th option (which also allows all those previously listed, and more archs, as its backend).

I fully support that LLVM was used, historically, since it has the best optimization (which is costly, in runtime and code-size), but by now, when all your dependencies are compiled, compiling your main script very cheaply with no optimization seems like a very possible good option, or even a VM could work like for Python (and Java) even if that VM bytecode is interpreted, for the main script, if it mostly call dependencies doing the heavy lifting (the Python model there those in C).

I also thought people would like this video about this Roc language which is very intriguing (e.g. it’s “platform” concept, to target different platforms, such as web, then no file system, or CLI, then file system support):

To fully compile, then the phases are you need to parse (JuliaSyntax.jl, a rather cheap dependency), then I believe the order is you lower first:

julia> @code_lowered 1+1
CodeInfo(
1 ─ %1 = Base.add_int(x, y)
└──      return %1
)

Is that a form you could run directly? When you use --compile=min (which works, “no” didn’t at some point, does it now, what’s the difference?), or if not, that you could? Or be the best option to interpret or compile? I suppose the answer could be different for compile, and even depend on excecting to optimize or not.

julia> @code_llvm 1+1
;  @ int.jl:87 within `+`
define i64 @"julia_+_959"(i64 signext %0, i64 signext %1) #0 {
top:
  %2 = add i64 %1, %0
  ret i64 %2
}
julia> @time code_typed(sqrt, (Float64,))
  0.000428 seconds (668 allocations: 46.891 KiB, 91.20% compilation time)
@time code_typed(sqrt, (Float64,), optimize=false)
  0.000218 seconds (185 allocations: 10.906 KiB, 80.26% compilation time)
1-element Vector{Any}:
 CodeInfo(
1 ─      nothing::Core.Const(nothing)
│   %2 = Base.Math.zero(x)::Core.Const(0.0)
│   %3 = (x < %2)::Bool
└──      goto #3 if not %3
2 ─      Base.Math.throw_complex_domainerror(:sqrt, x)::Union{}
└──      Core.Const(:(goto %7))::Union{}
3 ┄ %7 = Base.Math.sqrt_llvm(x)::Float64
└──      return %7
) => Float64

[Much faster to generate, slightly different code, maybe slower, not sure very slower, at runtime.]

This is I think the last step before @code_llvm (and @code_native). I rarely use @code_typed (I mostly look at native or llvm to see if well optimized), and where does it fit in the order? It seems it gives you Julia’s SSA IR (i.e. not LLVM bitcode, and that comes next), and is that interpreted, and could be directly compiled (or better to use an earlier form, for naive/fastest compilation):

https://docs.julialang.org/en/v1.11-dev/devdocs/ssair/

Julia uses a static single assignment intermediate representation (SSA IR) to perform optimization. This IR is different from LLVM IR, and unique to Julia. It allows for Julia specific optimizations.

  1. Basic blocks (regions with no control flow) are explicitly annotated.
  2. if/else and loops are turned into goto statements.
  3. lines with multiple operations are split into multiple lines by introducing variables.

[…]
PiNodes encode statically proven information that may be implicitly assumed in basic blocks dominated by a given pi node. They are conceptually equivalent to the technique introduced in the paper ABCD: Eliminating Array Bounds Checks on Demand or the predicate info nodes in LLVM. To see how they work, consider, e.g.
[…]
The main SSAIR data structure is worthy of discussion. It draws inspiration from LLVM and Webkit’s B3 IR. The core of the data structure is a flat vector of statements. […] so the LLVM-style RAUW (replace-all-uses-with) operation is unavailable.

It’s a bit strange maybe that generating non-optimized has more than double the allocations (still very comparable runtime for the generation):

julia> @time code_llvm(sqrt, (Float64,), optimize=false)
[.. more code as expected, presumably slower; number of allocations only apply to generation, not the generated code.]
  0.006495 seconds (4.49 k allocations: 190.578 KiB, 5.69% compilation time)

Also, Julia has many optimization levels (all just handed to LLVM I belive), but there only true or false allowed there, I guess true means what Julia was invoked with (e.g. its default 2), and false means none (0). Also, shouldn’t @code_native, also have an optimize option, since it runs after LLVM phase, inherit its?

2 Likes

It’s not just in case eval is used, it’s because Julia is a dynamic language where types aren’t necessarily known until runtime. LLVM is invoked whenever you call a function for argument types that weren’t known before, which can happen without eval. For example:

t = (0,)
for i = 1:10^3
    t = (i, t...)
    @show sum(t)
end

calls sum for a new type of argument on every loop iteration, necessitating recompilation. (And now suppose that instead of 10^3 you read the number of loop iterations from a file.)

You can currently only omit LLVM if you use a completely statically typed subset of the language. This is what GitHub - tshort/StaticCompiler.jl: Compiles Julia code to a standalone library (experimental) is for.

6 Likes

Yes, you just strengthened my argument for a compiler (or interpreter) needed at runtime, though it doesn’t need to be LLVM. I just took an example I knew to always be a theoretical possibility (and the halting problem prevents knowing if e.g. eval or other requiring a compiler at runtime needed). StaticCompiler is good when you know you don’t need more, but can never be a full solution in general to small binaries for arbitrary Julia code. [See the Roc-lang video at the time-point I provided about 6 megabyte web program, small by today’s standard… and the full video is very intriguing.]

I also mentioned “recompilation”. My point wasn’t that we need a (LLVM) compiler, but how best to replace it with a simpler one?

1 Like

I personally doubt it will be worth the effort to support a second compiler in the next few years; it’s more likely that one could replace it with an interpreter under more circumstances (this has already been discussed as a way to reduce startup latency).

I generally don’t find these speculative discussions to be too productive. Proposing huge undertakings (e.g. replacing the compiler) never amounts to much, because you aren’t proposing to do the work yourself, nor to fund it. No one is going to volunteer to take on such a large project because of a mailing-list thread. If anything ever happens, it will be due to someone else’s priorities, not what is written here.

33 Likes

I didn’t say that I wouldn’t. :slight_smile: But yes, maybe an interpreter is enough, assuming most of the code is precompiled. So far the current interpreter has been disappointingly slow (its good thing is very fast startup).

An optimizing compiler like LLVM is of course a huge undertaking, but a simple naive compiler can be rather little work, especially when most of the work has already been done, like here. And I want to do that work. First step is knowing where to plug it in. And I think knowing what the current interpreter actually interprets is then helpful, that would likely be the same point.

I only intend some solution to be (at least) as fast as Python (since most code is precompiled), i.e. compile to simple VM (then interpreted, even compiling to Python’s VM or some other known working might be an option, since we know it can call Julia code, though I DO have a plan for my own VM if I ever get to implementing it…), or just machine code (seems worse for portability). Also the global scope is a known performance killer (with known solution), I would also like to guard worst-case from slower than Python there (I bit off-topic here?, how).

2 Likes

I would say that the first step would be to learn how to contribute to Julia compiler internals by taking on a less ambitious project.

Up to now, you’ve mainly contributed documentation and few-line code patches, which are welcome, but going directly from that to implementing an entirely new compiler is not a realistic plan.

2 Likes

There seems to be three somewhat-unrelated questions being asked here.

  1. Can / should Julia replace or augment LLVM for codegen?
  2. Should there be a Julia interpreter?
  3. What would it take to remove LLVM itself from binary libraries generated from Julia code?

I don’t see any value in 1, at all. Emscripten has existed longer than WASM, for turning LLVM IR into WASM bytecode. It’s interesting that other languages (Roc, Go) have decided to ship their own native compiler, but Julia made the wise decision to use LLVM and there’s no compelling reason to go back on that. It remains state-of-the-art, and Julia has put in a tonne of work to make compiling (relatively!) fast, and has enabled precompilation for packages. I would say it’s striking the right balance in the tradeoff between compilation time and speed of execution of the resulting code. Try a non-trivial Rust project if you want to see what slow compilation looks like!

For 2, there’s already JuliaInterpreter, which I believe is feature-complete. I’m sure they’d welcome some help with making it faster to start up and run, if that’s of interest to you.

3, as I understand it, is being actively worked on. As @stevengj points out, it’s not an easy task, but writing Julia libraries which are roughly as small and efficient as, say, the Rust equivalent, is generally-agreed to be a valuable goal, and work on making that possible is active and ongoing.

6 Likes

In a previous thread, @tim.holy said that JuliaInterpreter.jl walks through the lowered code. The “low hanging fruit” (…for a veteran of Julia IR and internals…) would be to shift it to using the Julia IR emitted after all the optimization passes that happen at the Julia level.

However, JuliaInterpreter.jl’s first job is to support Debugger.jl, so it is really important that you know the providence of the code it’s running (so you can check for breakpoints and such). And that is already very hard at the lowered stage because the Femtolisp lowering machinary drops a lot of contextual information.

If/when @c42f completes a rewrite of the lowering stage, it will unlock the door to transparent tracing of Source-code-to-IR-and-back-again which would making a debugger running on optimized IR a bit easier.

All this to say, from my admittedly novice vantage point, anyone who wants to see improvements in the compiler, debugger, dev UX, startup latency, more powerful metaprogramming, etc. should probably start by figuring out how best to support @c42f with time or money.

8 Likes