A prototype of `pkgimage` binary cache system for reducing latency

This is a repost from the slack. Slack doesn’t work quite well for long articles, so I decide to repost it here.

Recently we (Tongyuan) have investigated into binary compilation using LLVM’s new JIT linking architecture. In short, we have coded a prototype which can generate binary code from the test files of the packages individually and load them in a new session to remove latency. Contrasted to PackageCompiler, our prototype can now compile the package separately, where the compile time is largely determined by the runtime of the test file. So there’s no need to compile all the things from scratch!

This prototype can be viewed as a simple implementation of the idea pkgimage proposed in JuliaCon 2022.

We achieve this by:

  1. Reuse and modify Julia’s builtin LLVM codegen pipeline to generate relocatable symbols.
  2. Utilize LLVM’s dynamic linking framework (mostly JITLink) to perform binary loading and symbol resolution (rely on Julia’s precompilation file to provide runtime information)
  3. Develop a Julia package named BuildSystem to drive compilation of each package from a makefile
  4. A full Julia interpreter constructed by bridging the ccall using dyncall C library. Julia’s builtin C interpreter doesn’t handle ccall and this causes a problem in our compiler bootstrap.

A scheme depiction is shown in this figure:

A demo of using the compiled JSON library is shown in the video:


A comparison in Julia 1.6 without binary cache:


Notice the latency when evaluating JSON.parse. It’s completely gone if using binary cache.

In the JSON sample, the difference may be small. Here is another example using Debugger. In our first run, the compilation is slow, then we use saveWork to cache the compiled result to the disk (at timestamp 01:15), then in our next session the latency is gone!

A demo of caching REPL result using our static compiler:


A reference without static compiler:

[asciicast]((asciicast:514149 - asciinema)

The compile cache is illustrated in following picture

And you can notice that the compiled binary is relatively small (5 MB) for each library.

This approach also has many shortcomings, like it wastes more memory and disk space, some of them can be solved while others are intrinsic. But we still need more observations and tests to confirm them.

Future work

Currently our prototype is still at an early stage. We also test our packages on Debugger and GR. Though a lot of effort has been put into debugging it, we still encounter many technical problems which fall out of our ability. For example, we cannot compile the __init__ function for many jll libraries, for these functions refer to non-relocatable items, which are generated by stdlib Artifact. But they contribute a large part of latency in loading the GR library, rendering our work less effectively. Another problem is that some packages’ tests are unsound, a.k.a they fail even using an unmodified build.

One of our concerns is that we don’t want to touch Julia’s internal and limit the modification in a few places. But as our project grows and we want to test more and more ideas, we find inevitably some places need redesign to function. So we decided to just stop here and leave them as future works.

To reproduce the demo, follow the instructions in our BuildSystem package. (https://github.com/ChenNingCong/BuildSystem). We use a relatively old build because we started this project around one year ago. Currently only tested on the Arch Linux system, other Linux systems should also work. And sorry for Mac and Windows users…


This is one of the more super-awesome projects to appear on this forum recently!
In case anyone missed it. I recommend giving it a second look.

(I only saw this because it was linked from Slack. If find that often I don’t notice the things that most interest me on this forum.)

This should be cross-posted to all appropriate Julia fora.

It would be pretty cool to try and upstream some part of this to base julia. There have been some large efforts in order to reduce latency and this branch seems to have had a lot of work done on it.

Since the changes were quite large I can’t imagine doing it all at once but individual changes are likely to be interesting