Questions about Compiler and Compiling Modules


#1

I’ve read a lot about JIT and Intperpreters so far, because I am very curious about how Julia works under the hood. However being completely new to compiler technology, some things still remain a mystery for me.

To my knowledge, Julia is a mixture of an Interpreter and a Just In Time Compiler, is that true? When a new Function is called, Julia knows the arguments et cetera and generates an LLVM code. The LLVM library (which is shipped with Julia but not part of the project I guess? I haven’t heard of LLVM before, but read a little about it until now) then generates machine code depending on the architecture it runs on. This makes the first function call slow (-> there is issued a compile chain at runtime: Julia->LLVM->machine code) and all further function calls fast (executing the cached function call). Is this true so far? I’ve read (spread across some discussions and issues), that the cached code can become very big, and therefore is not stored after exiting a Julia session. Where is it actually stored as long as the session runs? In RAM only, or also in some files?
On the other hand, in global scope the compiler knows nothing and everything is very dynamic. Therefore each instruction must be executed line by line and can’t be pre-compiled, just like Python does this. Therefore stuff in global scope is slower. Did I understand this correctly? Would an interpreter actually be faster, if you would call a function only once (and therefore there is no reuse of the cached machine code)? So you could get rid of the compile-overhead. But somehow an Interpreter also must generate machine code at some point. It is just not optimized, I guess, because it is not possible to optimize when looking only at a single line at once. So why is this actually called an Interpreter, and not a (unoptimizing) Compiler. Or asking the question differently: What is the difference between an Interpreter, an Just in Time Compiler, and Julia, exactly?

My second question is: When Julia compiles just in time, why is there a need for compiling modules, before using them? Is this due to external, non-Julia dependencies only? Or does a pure Julia module also do some valuable pre-compilation? Also (I asked this question already, but didn’t get too much reaction until now), why does using XY take so long sometimes? Is it due to loading the source file into RAM, or is there happening some other stuff as well? For example, when I use ControlSystems, the first call takes 9 seconds, the second call takes 2 seconds, and only the third call takes 0.3 ms. What is happening in the second call? Why does it not return instantly as in the third call?

I know this is a lot to ask in one thread. But at the moment I try to wrap my mind around how Julia works, and I found stuff is not so easy as it looks on the first glance. But understanding this really helps me when talking about Julia to others (who don’t know Julia yet), and also in using Julia correctly. So if someone could enlighten me a bit more than I am recently, that would be very much appreciated!


#2

Yes, this is accurate.

It’s stored in RAM.

Code in global scope (or code that interacts with the global scope in a nontrivial way) can’t be optimized as aggressively as code local to a function. It’s easy to optimize code in a function because the compiler sees at once all the code that could interfere. This is not true in the global scope. It’s not necessarily so simple as “line by line” however.

Yes, that’s plausible, but it probably depends on the nature of the interpreter, the code and the stuff it interacts with.

An interpreter does not necessarily have to generate machine code. Most interpreters nowadays actually generate something similar to machine code called bytecode, which is executed one-by-one by a sort of virtual CPU, which is nothing except a glorified if/else (or switch) inside a loop. Python’s starts here and goes on for a few thousand lines.

The terms have become less meaningful over time I should say. Almost every language today has some sort of compilation stage. The output is typically either bytecode (Python, Java), or native code that runs alongside a runtime (Julia, Haskell I think). Some languages produce fully independent native code (like C).

The runtime provides services like garbage collection and type checking. In Julia’s case it is sometimes possible to skip both, e.g. with functions that are fully type stable and only live on the stack.

For Julia, the important things to note is that the end result really is native code, and that it really is just-in-time compiled.

Someone more familiar than I am with Julia should probably answer the rest of your questions. Hope that helped.


#3

While it might become (relatively) big, that’s not the reason why it can’t be cached (that would imply it can’t fit on a disk!?). The problem is, how to cache it and then load it back, with all pointers + memory sections and internal caches staying valid (simplified).

That’s not true, it will just emit less optimized code, since anything can change in global scope (so there are no well defined input types like it’s the case for a function call). It’s still not interpreted in that case - there is a julia interpreter though, but that has still lots of bugs and isn’t very well tested.

As a matter of fact, global code in a package module will get precompiled! Which is why all objects generated from code in global scope need to be serializable, since the code won’t be executed a second time and instead all results will just get deserialized!

When Julia compiles just in time, why is there a need for compiling modules, before using them?

Julia needs to parse the files, build up method tables, do type intersection of the method signatures for multiple dispatch, do type inference etc - that all happens before any function is passed to the JIT.

For example, when I use ControlSystems, the first call takes 9 seconds, the second call takes 2 seconds, and only the third call takes 0.3 ms

That shouldn’t happen for normal code! Although, I did encounter similar cases with code with side effects. Usually that happens, if the code paths change on second call, so that new functions need to get compiled.
I run into this, when caching a file loading operation - the first call would load the file into a global and the second would just return it (a bit more complex than that), so different functions were compiled.


#4

It’s done for sysimg. Not a fundamental problem.

The code precompilation is unrelated to object serialization.


#5

One big issue about long-term caching of compiled code is dependencies.

Julia has a lot of implicit mutable global state: Especially type definitions, method tables, etc.

Now, each call implicitly depends on this global mutable state.
There are three ways of dealing with that: (1) Make it immutable; e.g. you cannot redefine types. (2) Be inconsistent / load-order dependent, like methods were previously; and (3) invalidate cached entries on changes.

(3) is achieved by adding backedges: If compiled method A depends on method B, then we have an edge A -> B and store a backedge B->A. When B gets invalidated (redefined), then A gets invalidated as well.

There are various special cases like @pure and @generated that require some programmer care in order to avoid load-order dependent errors.

Now, if you wanted to cache code cross-session then you would need to build something like a giant Merkle tree (DAG) that serializes/hashes all global state dependencies, and is resistant to abuse of @pure and @generated (starting a new session should heal programmer mistakes from the previous session!). That has not been done yet.


#6

It shouldn’t be a bigger problem than caching inferred code which is already done now.


#7

Yeah, it’s not a fundamental problem - that’s also how PackageCompiler works after all :wink: I meant to say, it’s not as trivial as save(julia_data_in_ram, "cache.bin").

So how does this work if nothing gets serialzied to disk & and nothing gets re-run?

module TestPackage
function test()
    println("bla")
    rand(2)
end
variable = test()
end # module

julia> using TestPackage
[ Info: Recompiling stale cache file C:\Users\sdani\.julia\compiled\v1.0\TestPackage\BoZvv.ji for TestPackage [cc787f99-0094-596c-90da-9a0362a24f4c]
bla

julia> TestPackage.variable
2-element Array{Float64,1}:
 0.3836146119775292
 0.6864289757241291

julia>quit()
PS C:\Users\sdani> julia

julia> using TestPackage

julia> TestPackage.variable
2-element Array{Float64,1}:
 0.3836146119775292
 0.6864289757241291

#8

It works like this because the object (not the code) gets serialized and nothing gets re-run. That is to say,

is not the case (well, I’m not sure how good a job we do to filter that out but it’s not going to be needed for sure). Global code, whether in a package or not, will only run once so it doesn’t need to be precompiled and definitely doesn’t need to be saved.

Just to clarify, I didn’t say

is wrong. I said it’s unrelated to code precompilation.

Edit: and depending on how you define precompilation, the bottom line is that the global code in a precompiled package is not being dealt with in any way different from other global code. The object they construct are treated differently for sure.


#9

Ok fair enough, I didn’t phrase that very accurately.
I didn’t meant to imply that any code gets saved. But what I described happens during the process which is named “precompilation” - so for simplicity I referred to it as part of precompilation.


#10

Thanks for all your answers! It was very interesting to read. Though I needed some time for understanding some of the answers. The topic of cache storage is also very interesting. Actually I was searching and reading a bit about that as well, and the answer to this question seems like: It is possible, but a lot of work to do. Nonetheless, it would be great if cache storage could be achieved one time. I think this will improve the user experience a lot.

So thanks again for all your explanations. But I still have some more questions.

I see one problem here. Let me quickly explain my experience with Matlab: everytime I run a simulation there and store variables, my RAM gets filled up very quickly. The only way to clean my RAM in order to continue working without a swaping machine is to close Matlab completely. I see a similar issue in Julia here: When a lot of code is generated due to multiple dispatch, the code could (for big projects or libraries) add up quickly. And the only possible way is to buy more RAM or to exit() Julia. Or is there some garbage collector for not often used code, so it gets removed from the RAM and recompiled on demand?
I did not experience such a problem with Julia yet, since I only did very small tasks with Julia. But in our big Matlab simulation this is a serious issue for me. Are there some experiences regarding RAM usage in big Julia projects? (Maybe this is already a new issue here^^)

Thanks for the link to the Python switch! Very interesting. But I don’t really understand the difference between machine code and bytecode. Isn’t the llvm assembler from Julia also some Bytecode for a virtual CPU (the llvm namely)?
And what is executing the commands from Python in the switch? Is there another virtual machine under the hood, or is for every operation a C routine called?
So is the actual difference between Julia and Python, that Julia compiles (meaning: analyses the high level code and optimizes it), while Python runs the code line by line and calls some virtual machine functions / C-code (and does some more inefficient stuff like allocating a lot of memory)?
So, why is the code from Python not native, and from Julia it is (although there is a llvm in between), and what exactly does Python make so slow and inefficient? (I don’t know a lot about Python to be honest. I just was forced to use it and I didn’t like it very much for several reasons. So I felt very happy when discovering Julia, which is a lot more convenient to write than plain C++)

Why is there actually an interpreter for Julia? What is its use? And what is the difference, functionality wise? Maybe when I could understand this, this would clarify a lot for me.

So is Julia doing all this already for all functions in the module and the functions of the modules it depends on (even indirectly)? And the JIT only compiles functions, when I call the API with different types? So when I write a module with completely fixed types in all functions, would it only precompile, and the JIT would have nothing to do (except garbage collection)?

But what is the difference between precompiling a module the first time EVER (like when it is showing the message: [Info: recompiling stale cache file…]), and simply using an already precompiled module the first time in a Julia session? Because every first time in a session - even when not modifying the packages and the Julia installation - the using of the ControlSystems library takes 9 seconds. What is Julia doing in this time? Precompiling stuff? Why didn’t this happen before, when the cache file has been precompiled?

I’ll continue trying to wrap my mind around this compiler stuff. Hopefully it will make ‘click’ at some point :smiley:


#11

No the LLVM is not an interpreter, despite LLVM standing being an acronym for Low Level Virtual Machine that is not what the LLVM is. LLVM is a compiler that takes as input LLVM bitcode and uses that to compile native code.

Julia compiles to LLVM bitcode which the LLVM then compiles directly to native code. Python compiles to python bytecode which is then interpreted by a virtual CPU.

The Julia interpreter can be faster if you only intend to call a function once, this can be useful in short running scripts.

It only compiles when you call it (or force it to precompile), even if you give it all the types when you define it.

julia> f(x::Int64) = x
f (generic function with 1 method)

julia> @time f(3)
  0.003886 seconds (409 allocations: 27.681 KiB)
3

julia> @time f(3)
  0.000003 seconds (4 allocations: 160 bytes)
3

julia> g(x) = x
g (generic function with 1 method)

julia> precompile(g,(Int,)) #compile without calling
true

julia> @time g(3)
  0.000004 seconds (4 allocations: 160 bytes)
3

Garbage collection is a separate matter from JIT.


#12

It seems unlikely that it is code that is using your memory in Matlab. Normal usage of Julia will certainly not run into this problem. I’ve seen some extreme situations where someone was using Julia as a high level language for JITting database queries where this was an issue (which was solved by throwing away processes after a while), but never in normal usage.


#13

Hm, I don’t understand the difference between bitcode and bytecode yet (except that the one is represented in bits and the other one in bytes of course). I’ve googled and read a bit about it (this link had a good explanation for me), but it’s still difficult to understand what really is the difference. Probably I have to come back and dig into the topic when I have more time. For now, I just understand that Julia compiles via LLVM to bitcode, which can be optimized for several architectures, while Python hast to execute basic operations step by step, which cannot be optimized or natively compiled for any specific compiler. I hope that I did understand this thing right.

Ah, nice! That was exactly what I supposed in my first post :smiley: But what would be the difference functionally? I suppose the Julia Interpreter doesn’t use LLVM any more? Instead it uses some switch statement. Will this switch statement be compiled as a C library, and then Julia Interpreter would call this C-API for every statement?

(EDIT: For this question see also my post below. I think I understand now how this works. Thanks to you all for helping me in understanding this!!)
Ah, I see. And this is happening, when a module is precompiling. Thanks! But still, why does the first time using a module (when it has already been precompiled), take so long for loading? Example:

julia] add ControlSystems
# Installing ...
julia> using ControlSystems
# Info: Precompiling Stale Cache File, takes quite a time but this is OK
julia> exit()
# Restarting Julia
julia> @time using ControlSystems # Takes 9 Seconds
julia> @time using ControlSystems # Takes 2 Seconds
julia> @time using ControlSystems # Only takes some ms
julia> exit() # Again Restarting Julia
julia> @time using ControlSystems # Takes 9 Seconds
julia> @time using ControlSystems # Takes 2 Seconds
julia> @time using ControlSystems # Only takes some ms

I experience this EVERY time I start a new Julia session. This makes me baffle.

Yes you’re right. This is not because code, it is (I suspect) because of a lot of variables which get stored in the workspace and will be saved to disk finally. (Although the disk representation of the MAT file is only some MB, not anywhere near the 16GB of RAM Matlab needs sometimes.) It could also be to some memory leaks in mex files, but I didn’t analyze this.
The point was rather, that I am afraid to experience the same BEHAVIOUR with Julia, although because for an entirely different reason :wink:
I see, it is good to know, that this probably won’t become a problem. May I asked about how this database issue was solved? Was it a parallelized server with multiple processes, which compiled its own version of every function? And then after some time, when one process did take a lot of RAM because it was running and compiling new functions for quite a time, it was killed (which freed the RAM), and a fresh process was spawned instead?


#14

EDIT: After some thinking something occurred to me: Is this behaviour due to some functions being executed in global scope of the module ControlSystems? Maybe for initializing some stuff? I didn’t think about that before… When I have some time (again) I really need to dig into the code of this module and check.

Here a real code example for my question about the loading time for some modules, with real output:

maximilian@Orion:~$ julia
julia> @time using ControlSystems
[ Info: Recompiling stale cache file /home/maximilian/.julia/compiled/v1.0/ControlSystems/WTvAN.ji for ControlSystems [a6e380b2-a6ca-5380-bf3e-84a91bcd477e]
 68.040499 seconds (17.57 M allocations: 972.214 MiB, 0.81% gc time)
julia> @time using ControlSystems
  2.036492 seconds (5.06 M allocations: 239.614 MiB, 4.27% gc time)
julia> @time using ControlSystems
  0.000122 seconds (293 allocations: 15.281 KiB)
julia> exit() # (*)
maximilian@Orion:~$ julia
julia> @time using ControlSystems
  8.752765 seconds (16.10 M allocations: 900.016 MiB, 5.80% gc time)
julia> @time using ControlSystems
  2.487725 seconds (6.02 M allocations: 286.327 MiB, 3.83% gc time)
julia> @time using ControlSystems
  0.000262 seconds (293 allocations: 15.281 KiB)
# And repeat from (*) again and again, and get the exact same results


#15

That’s an interesting observation.
If you use @profiler in atom, you can see that type inference is happening on the second call, while on the first call the main call is include_from_serialized.
Could be the side effect of some lazy loading - so that the first using is not actually doing 100% of the work


#16

And yet another question: When all the modules are being precompiled already, why does the first call of a function take longer than the second? Should it not be precompiled already?

@time using ControlSystems #  0.000323 seconds (317 allocations: 17.125 KiB)
@time  lqr(A,B,I,I) #  2.988897 seconds (7.17 M allocations: 352.596 MiB, 11.64% gc time)
@time  lqr(A,B,I,I) #  0.000344 seconds (110 allocations: 19.703 KiB)

#17

Should it not be precompiled already?

Yes and no! Julia consist of multiple compilation stages:
parsing -> lowering -> type inference -> LLVM IR -> native code.

The first 2 (maybe partly 3 if all types are fixed) are done during what we call precompilation.
The last three only happen when you call a function.

using StaticArrays # precompiled
m = @SMatrix rand(3, 3)
@profiler inv(m) # lots of calls to type inference

#18

note that I call it LLVM IR (Intermediate Representation), which I find less confusing than bytecode.

It’s just the representation that LLVM does it’s optimizations on and finally generates native code from.


#19

Thank you! You helped me a lot to understand this! :slight_smile:


#20

Compiling to LLVM IR isn’t the important part. It’s just yet another language. You are just changing the question from why is julia compiled while python is not to why is LLVM IR compiled while python bytecode is not. The difference is whether native code is generated in the end.

To be fair, there is a continuous spectrum here. Compilers are not all the same. On one extreme, there’s the tranditional compiled language which model the hardware quite well and each basic operation can directly map to hardware instructions without much effort. On the other extreme, there’s cython with unmodified python code which really isn’t compiling much even though it does generate native code from the python code. We then have other JIT (js jits, pypy, etc) in between which generates fairly efficient code but also a lot of overhead due to runtime checking. Well written julia code is somewhere in between other JITs and tranditional compiled languages since there are usually much less but non-zero runtime checks.

In the end, I don’t find whether things are compiled to be a very useful concept since after all, the CPU is always executing native instructions and it’s what instructions it execute that’s important. In another word, the goal of compilation is to remove overhead in execution and it’s how much of it you can remove that’s important.