LLVM is a very heavy dependency, and is required (for fully compiled apps), just (I meant e.g.) in case eval
is used. I would like to drop LLVM requirement (for still rather fast code), and I know it’s possible, then you just (currently) risk a runtime error.
Or you can use “min” here (but it’s very slow at runtime since all your code in interpreted):
--compile={yes*|no|all|min}
Enable or disable JIT compiler, or request exhaustive or minimal compilation
When I say all your code, I think I mean e.g. your script file, but precompiled package code would still be fully compiled to native code (unless recompilation needed I guess).
Can someone clarify what is being interpreted, is it the source code, i.e. in each iteration of a loop, or is it actually the LLVM bitcode (for such loops), then still needing LLVM, and at least naive non-optimized LLVM compilation?
LLVM is claimed to be very slow, even when you ask it to do not much, no optimization, see link below.
Many languages/compilers do not use LLVM, and some e.g. Roc allows both, it has a WebAssembly backend (best supported), plus native x86 assembly, and native ARM, PLUS LLVM backend as 5th option (which also allows all those previously listed, and more archs, as its backend).
I fully support that LLVM was used, historically, since it has the best optimization (which is costly, in runtime and code-size), but by now, when all your dependencies are compiled, compiling your main script very cheaply with no optimization seems like a very possible good option, or even a VM could work like for Python (and Java) even if that VM bytecode is interpreted, for the main script, if it mostly call dependencies doing the heavy lifting (the Python model there those in C).
I also thought people would like this video about this Roc language which is very intriguing (e.g. it’s “platform” concept, to target different platforms, such as web, then no file system, or CLI, then file system support):
To fully compile, then the phases are you need to parse (JuliaSyntax.jl, a rather cheap dependency), then I believe the order is you lower first:
julia> @code_lowered 1+1
CodeInfo(
1 ─ %1 = Base.add_int(x, y)
└── return %1
)
Is that a form you could run directly? When you use --compile=min
(which works, “no” didn’t at some point, does it now, what’s the difference?), or if not, that you could? Or be the best option to interpret or compile? I suppose the answer could be different for compile, and even depend on excecting to optimize or not.
julia> @code_llvm 1+1
; @ int.jl:87 within `+`
define i64 @"julia_+_959"(i64 signext %0, i64 signext %1) #0 {
top:
%2 = add i64 %1, %0
ret i64 %2
}
julia> @time code_typed(sqrt, (Float64,))
0.000428 seconds (668 allocations: 46.891 KiB, 91.20% compilation time)
@time code_typed(sqrt, (Float64,), optimize=false)
0.000218 seconds (185 allocations: 10.906 KiB, 80.26% compilation time)
1-element Vector{Any}:
CodeInfo(
1 ─ nothing::Core.Const(nothing)
│ %2 = Base.Math.zero(x)::Core.Const(0.0)
│ %3 = (x < %2)::Bool
└── goto #3 if not %3
2 ─ Base.Math.throw_complex_domainerror(:sqrt, x)::Union{}
└── Core.Const(:(goto %7))::Union{}
3 ┄ %7 = Base.Math.sqrt_llvm(x)::Float64
└── return %7
) => Float64
[Much faster to generate, slightly different code, maybe slower, not sure very slower, at runtime.]
This is I think the last step before @code_llvm
(and @code_native
). I rarely use @code_typed
(I mostly look at native or llvm to see if well optimized), and where does it fit in the order? It seems it gives you Julia’s SSA IR (i.e. not LLVM bitcode, and that comes next), and is that interpreted, and could be directly compiled (or better to use an earlier form, for naive/fastest compilation):
https://docs.julialang.org/en/v1.11-dev/devdocs/ssair/
Julia uses a static single assignment intermediate representation (SSA IR) to perform optimization. This IR is different from LLVM IR, and unique to Julia. It allows for Julia specific optimizations.
- Basic blocks (regions with no control flow) are explicitly annotated.
- if/else and loops are turned into goto statements.
- lines with multiple operations are split into multiple lines by introducing variables.
[…]
PiNodes encode statically proven information that may be implicitly assumed in basic blocks dominated by a given pi node. They are conceptually equivalent to the technique introduced in the paper ABCD: Eliminating Array Bounds Checks on Demand or the predicate info nodes in LLVM. To see how they work, consider, e.g.
[…]
The main SSAIR data structure is worthy of discussion. It draws inspiration from LLVM and Webkit’s B3 IR. The core of the data structure is a flat vector of statements. […] so the LLVM-style RAUW (replace-all-uses-with) operation is unavailable.
It’s a bit strange maybe that generating non-optimized has more than double the allocations (still very comparable runtime for the generation):
julia> @time code_llvm(sqrt, (Float64,), optimize=false)
[.. more code as expected, presumably slower; number of allocations only apply to generation, not the generated code.]
0.006495 seconds (4.49 k allocations: 190.578 KiB, 5.69% compilation time)
Also, Julia has many optimization levels (all just handed to LLVM I belive), but there only true or false allowed there, I guess true means what Julia was invoked with (e.g. its default 2), and false means none (0). Also, shouldn’t @code_native
, also have an optimize option, since it runs after LLVM phase, inherit its?