So does Julia compile or interpret?

Hi, I’ve been searching for this but I’m not sure there is a definitive answer written somewhere. I understand it may be an “implementation detail” and that “julia compiles to efficient native code for multiple platforms via LLVM” is correct but this information is certainly not exhaustive. I especially don’t want to play with words like “interpretation doesn’t exclude compilation” etc. I would like to know the “compilation pipeline” of Julia.

Is there any link where it is explained atleast roughly without requiring me to dig deep into code?

I’m comparing it to Python where I understand it that in Python when you run some file it first gets “pre-compiled” into a bytecode (generating a) file “.pyc” that on following code execution gets interpreted by Python’s Virtual Machine.

In Julia if I understand it correctly when I run some code it gets (in runtime) precompiled which happens every time I run a .julia file for the first time (and it generates natively executable files in the .julia folder?) Or does it precompile on a function-by-function basis? This precompilation generates native code (a set of assembly instructions) which do not need further interpretation. Those instructions are executed “as-is”.

Am I correct? What is the process like? Thanks!

2 Likes

I will try to explain how it works because this is a good opportunity to see how well I understand it myself.

Bottom line: Julia compiles a native version of a function the first time it is run with a certain set of argument types (without creating any build artifacts).

Now let’s get into more details…

The core devs like to call Julia a “Just ahead of time” compiler. In contrast to AOT (ahead of time) compilers (e.g. C or C++) which compile a static binary upfront and classical JIT (just in time) compilers which usually start by interpreting and tracing your program and then compiling hot spots to native code behind the scene.
Julia works more like an AOT compiler in that sense because it does not do any tracing but compiles (almost) everything. Just not in a separate compilation stage before runtime.

The following things happen when you pass your code to the Julia compiler either by executing a script or typing it into the REPL (I will be glossing over details such as parsing and lowering because they are not interesting in the scope of this discussion):

  1. Julia runs type inference on your code to generate typed code.
  2. The typed code gets compiled to LLVM IR (Intermediate Representation).
  3. The IR gets handed over to LLVM which generates fast native code.
  4. The native code gets executed.

One of the beautiful things about Julia is that this is not a black box and you can observe all steps of the process if you wish to.

Let’s use the following simple function as an example and type it into the REPL:

julia> function add(a, b)
           return a + b
       end
add (generic function with 1 method)

If I want to see the results of type inference, I can use the @code_typed macro:

julia> @code_typed add(1, 1)
CodeInfo(
1 ─ %1 = Base.add_int(a, b)::Int64
└──      return %1
) => Int64

I have used two 64-bit integers as arguments and type inference has determined that the return type will be an Int64 as well.

julia> @code_typed add(1, 1.0)
CodeInfo(
1 ─ %1 = Base.sitofp(Float64, a)::Float64
│   %2 = Base.add_float(%1, b)::Float64
└──      return %2
) => Float64

If one argument is a Float64 the return type will be Float64 as well.

To see what happens when this gets compiled to LLVM IR we can use @code_llvm:

julia> @code_llvm add(1, 1)
;  @ REPL[2]:1 within `add'
define i64 @julia_add_303(i64 signext %0, i64 signext %1) {
top:
;  @ REPL[2]:2 within `add'
; ┌ @ int.jl:87 within `+'
   %2 = add i64 %1, %0
; └
  ret i64 %2
}

And the resulting native code can be shown with @code_native:

julia> @code_native add(1, 1)
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ REPL[2]:2 within `add'
; │┌ @ int.jl:87 within `+'
	leaq	(%rdi,%rsi), %rax
; │└
	retq
	nopw	%cs:(%rax,%rax)
; └

We can also observe the compiler at work by timing the execution of the function with @time.

# `a` is random vector of Float64
julia> a = randn(1000);

julia> typeof(a)
Vector{Float64} (alias for Array{Float64, 1})

julia> function mysum(v)
           return sum(v)
       end
mysum (generic function with 1 method)

julia> @time mysum(a)
  0.029869 seconds (80.40 k allocations: 4.757 MiB,
  99.92% compilation time)
-8.915810948177993

The first time we run mysum with a as an argument, i.e. mysum(v::Vector{Float64}), the function gets compiled and spend 99% of the time compiling.

The second time around the result of the compilation is cached and there is no compilation overhead.

julia> @time mysum(a)
  0.000005 seconds (1 allocation: 16 bytes)
-8.915810948177993

Now we run mysum with an argument of a different type mysum(v::UnitRange{Int64}) and the compiler needs to run again.

julia> b = 1:1000
1:1000

julia> typeof(b)
UnitRange{Int64}

julia> @time mysum(b)
  0.005921 seconds (8.62 k allocations: 444.457 KiB,
 99.70% compilation time)
500500

But again no compilation overhead on the second run.

julia> @time mysum(b)
  0.000004 seconds (1 allocation: 16 bytes)
500500

The results of compilation are cached within the running Julia process which means that no compilation artifacts are written to disk and everything needs to be recompiled once Julia is restarted.

You also mentioned precompiling which is a different concept in the Julia world and explained in detail in this tutorial: Tutorial on precompilation

43 Likes