I’m working on developing a Speculative compilation support in LLVM ORC JIT Infrastructure.
As LLVM ORC supports compiling in multiple backend threads, it would be effective if we compile the functions speculatively before they are called by the executing function. So when we request JIT to compile a function, JIT will immediately returns the function address for raw executable bits. This will greatly reduce the JIT latencies in modern multi-core machines. And also I’m working on designing a ORC in-place dynamic profiling support, by this JIT will automatically able to identify the hot functions, and compile it in higher optimization level to achieve good performance.
I’m proposing this project for GSoC 2019 under LLVM . It would be helpful to know how this new features are effective to Julia community, so that I include your comments in “View from Clients” proposal section.
This is a very interesting idea! I’m wondering, how do you plan to determine what functions need to be compiled before you’re ready to compile the correct signature (namely, after type inference has gotten far enough through your current JIT’d function)?
Also, the deadline for submissions is rapidly approaching, so make sure you have your stuff together and submitted ASAP!
Also, have you found a mentor for your project? I feel like Valentin, Keno, Tim B., or Jameson would be good mentors, but I don’t know if they’re participating or are interested.
Thanks for replying. There are two ways to find the likely functions which are going to execute next.
- Static program analysis
I’m proposing profiling approach. It consists of two phases 1) Recording Phase 2) Play back Phase
In Recording phase (training), all functions are compiled on the demand, that is when they are executed at runtime. The idea is when function F1 makes the compile request to the JIT for a function F2. JIT will record the function F2 and associated it with likely executable map of F1, the map will contains all functions that are called by F1 in the order. The map will also contains other important profiling information like, function call frequency etc. which will help to enable aggressive optimization for the function later.
This data is stored on to the disk, when record phase completed.
In Play back Phase, the JIT will load the profile data from the disk. As soon as when the client request for the symbol and invoke it in runtime. JIT will submit the likely executable functions to the thread pool to compile. If the JIT succeeds in compiling in background thread, it will give back the address of raw executable bits, or if it still compiling that function, the execution thread will wait for the function to get compiled.
This will reduce the application start-up time greatly, provided if you use the representative inputs while profiling. If you are not, JIT will fall back to Lazy compiling.
Any doubts, please ask
Ahh, that makes plenty of sense to me! Well, you definitely have some serious work on your hands, but with the newly landing multithreading functionality in Julia I think this is a feasible and extremely valuable project. I look forward to seeing how things progress
I’m planning to propose this for LLVM organization. I think mentors can be from different organizations also. If you guys are interested, please tell me.
I have mentioned in my previous reply. If you have any doubts please ask me !
If anyone interested in co-mentoring this project with llvm or mentoring with julia. Please let me know, I interested to see how this will improve to Julia to which extent !
So what is the result of this? I saw you achieved some significant speedups
You can find the detailed work here: https://preejackie.github.io/GSoC-2019-LLVM/
It would be very helpful to know in what you are interested in?
I was just curious whether this new feature will make it into the compiler of Julia
I’m not familiar how julia uses LLVM in more detail, but i’m very much happy to help people who are julia compiler person to take this forward
someone ping @jeff.bezanson then, oh I just did…
Looks like a very cool project! Speculative compilation based on profile data is pretty fancy; I think we would be happy just to get some parallel speedups in our JIT. Our JIT is defined here: https://github.com/JuliaLang/julia/blob/master/src/jitlayers.cpp
We already get some laziness naturally, since our runtime is set up to compile things on demand. But of course we sometimes identify several needed functions at once (by static analysis) and it would be nice to compile on multiple cores in that case. Any advice you have for how to achieve that would be really useful!