This is a repost from the slack. Slack doesn’t work quite well for long articles, so I decide to repost it here.
Recently we (Tongyuan) have investigated into binary compilation using LLVM’s new JIT linking architecture. In short, we have coded a prototype which can generate binary code from the test files of the packages individually and load them in a new session to remove latency. Contrasted to PackageCompiler, our prototype can now compile the package separately, where the compile time is largely determined by the runtime of the test file. So there’s no need to compile all the things from scratch!
This prototype can be viewed as a simple implementation of the idea pkgimage
proposed in JuliaCon 2022.
We achieve this by:
- Reuse and modify Julia’s builtin LLVM codegen pipeline to generate relocatable symbols.
- Utilize LLVM’s dynamic linking framework (mostly JITLink) to perform binary loading and symbol resolution (rely on Julia’s precompilation file to provide runtime information)
- Develop a Julia package named
BuildSystem
to drive compilation of each package from a makefile - A full Julia interpreter constructed by bridging the
ccall
usingdyncall
C library. Julia’s builtin C interpreter doesn’t handleccall
and this causes a problem in our compiler bootstrap.
A scheme depiction is shown in this figure:
A demo of using the compiled JSON
library is shown in the video:
A comparison in Julia 1.6 without binary cache:
Notice the latency when evaluating JSON.parse
. It’s completely gone if using binary cache.
In the JSON
sample, the difference may be small. Here is another example using Debugger. In our first run, the compilation is slow, then we use saveWork
to cache the compiled result to the disk (at timestamp 01:15), then in our next session the latency is gone!
A demo of caching REPL result using our static compiler:
A reference without static compiler:
[]((asciicast:514149 - asciinema)
The compile cache is illustrated in following picture
And you can notice that the compiled binary is relatively small (5 MB) for each library.
This approach also has many shortcomings, like it wastes more memory and disk space, some of them can be solved while others are intrinsic. But we still need more observations and tests to confirm them.
Future work
Currently our prototype is still at an early stage. We also test our packages on Debugger
and GR
. Though a lot of effort has been put into debugging it, we still encounter many technical problems which fall out of our ability. For example, we cannot compile the __init__
function for many jll libraries, for these functions refer to non-relocatable items, which are generated by stdlib Artifact. But they contribute a large part of latency in loading the GR library, rendering our work less effectively. Another problem is that some packages’ tests are unsound, a.k.a they fail even using an unmodified build.
One of our concerns is that we don’t want to touch Julia’s internal and limit the modification in a few places. But as our project grows and we want to test more and more ideas, we find inevitably some places need redesign to function. So we decided to just stop here and leave them as future works.
To reproduce the demo, follow the instructions in our BuildSystem
package. (https://github.com/ChenNingCong/BuildSystem). We use a relatively old build because we started this project around one year ago. Currently only tested on the Arch Linux system, other Linux systems should also work. And sorry for Mac and Windows users…