So around two years ago, I started a hobby project to use LLVM’s new JIT framework to produce small binaries for Julia (warning : it failed).
Crash course for Julia’s compiler in case that you don’t know how it works: Julia source code is firstly translated to IR (much like bytecode in Java). Type inferencer (implemented in Julia) adds type annotation to the IR. Then this typed IR is emitted to machine code by LLVM.
The workflow is like this:
- Compile the type inferencer by caching all IRs from the type inferencer and emit LLVM binary code, then discard all IRs and method tables.
- Build my own base/stdlib library with only selected functionality (by including relevant source files). So I can remove unused dependencies and further speed up compilation.
And it turns out…the type inferencer itself was type unstable in several places, including some common mistakes like unhandled union splitting (Any is not splitted against complicated method signatures), closure and so on. So I can’t discard method tables for the type inferencer because dynamic dispatch needs to look up method tables.
I achieved this by a simple script because at that time Cthulhu.jl
is not automatic enough at this task. :
for each IR in code_typed(function call):
if IR is a dynamic dispatch function call:
extract source code information from function's debug table
print human readable error
else if IR is static function call:
recur into function call if the signature combination is not seen before
end
end
It generates a list a possible error and source location. So in vscode I can quickly jump to the location and fix errors as many as possible, then start a new round.
I eventually gave up and decided to simply file an issue on github, because I didn’t want to patch Julia’s type inferencer – it complicated my workflow, and honestly speaking, I would rather rewrite the type inferencer in C++ instead of patching it in Julia, because it’s so inconvenient to boostrap the type inferencer and debug it.
Another experiement I did is to design a AST type inferencer for Julia (again, a simple 2000 lines type checker that can be implemented by any PL undergraduate). The lesson I learned from my previous experiment is that : IR type checker is essentially unusable because IR is not designed for human and the output is hard to interpret. So this time I set up some restricted syntax rules to systematically decide what should be considered as “type stable”.
I used it to scan some small packages, the result is surprising : many packages are not well typed and made small dump mistakes. Some people have hard time with parameterized type and forget to supply enough parameters. This is contrary to what people usually claim in the discourse. I fixed them manually by opening an issue on gihub. But then I got tired - it’s just not my duty to fix errors for other people and this way doesn’t scale. People should do this by themselves as they know more context than me.
I also tested it on my legacy computer graphics code, a tiny ray tracer. I caught two errors (one caused by type conversion and one by misspell) in the hot loop. IR checker simply can’t detect this because Julia silently does union splitting and the final result has no dispatch.
Anyway, none of these requires nontrivial PL techniques. Basically every of my PL friends know this as undergraduate, and one of them developed Julia’s Jetbrain plugin… That’s why I said a niche community is a huge problem. You need to spend more time and more effect to achieve the same thing…