I wonder, are there any concrete medium to long-term plans regarding opportunistic caching of generated binary code (separately per architecture, to support computing environments with mixed CPU generations, AVX versions, etc.)?
I know this has been discussed many times, and I’m fully aware that this is not exactly a low-hanging fruit. I’m just curious if it’s on the roadmap: When advocating for Julia, one of the first things that people notice and express concern about are package loading and code generation times (“Look, this is all very nice, but I can do that much faster in Python.”).
We do have some great mitigations in place (eternal thanks to @tim.holy for Revise!), and we make sure to pre-load our notebooks ahead of time before giving a presentation - but it’s still an issue, of course. And as packages are getting bigger and more complex over time, and as the total amount of code that people access in their applications grows, I expect that load and code-gen times will grow too, in the future.
I tell new people that Julia v1.0 is out now, and that that had to take priority, that load times aren’t all that bad, and that this will all be sorted out in the future. And I am convinced all of that is true. But sometimes, I wish I could say something at least a bit more concrete about the when and how this may be sorted out, especially when trying to convince people to adopt Julia for (or at least accept Julia in) long-term projects.
Take a look at Compiler work priorities for a recent update from Stefan; I think it answers your question (though maybe not the fine detail).
The high-level answer is caching machine code is very appealing and has been considered very often but it’s fairly tricky. As the compiler work priorities post that @dawbarton’s linked to indicates, compiler latency is one of the very top priorities. If you’ve got any bright ideas on how to tackle the problem, you can hop on the JuliaLang Slack #internals channel and discuss.
Thanks a lot for the quick update and the link to the compiler work priorities!
I’m not sure I have a any bright solutions to offer - I do realize that it’s a tricky thing to get right (and stable) and I’m know people with much deeper insight into the compiler have been thinking about this in depth and for a long time …
The one thing I came up with that it would consider important (esp. for users in scientific computing environments, with home directories shared between machines with different CPU generations or working on heterogeneous clusters) is that caching should happen separately for each CPU architecture (so that the cache won’t be invalidated when switching from an AVX2 to an AVX512 machine). But of course that’s a trivial aspect compared to making caching work in the first place.
I think having a mature and easy-to-use PackageCompiler which is part of the
stdlib would be a great substitution for full-blown caching in many use cases. The biggest problem with Julia right now is not just that we have to recompile when using the REPL but that the methods for creating static (or dynamically linked) binaries for non-interactive use are fiddly and sometimes difficult. This is a particularly egregious problem in these days of “microservices” where many programs are started from scratch in a newly initialized docker container every time they run, as in that case there is basically no ability to mitigate the compile times at all.
I’m very happy to see all the great work that’s been done on PackageCompiler, and I thank everyone involved for it. We still have a way to go, however, as using it so far has been tricky enough that I’ve been rather hesitant to use PackageCompiler by default everywhere it’s needed.
(Sorry to have a post largely about static compilation when the thread was originally on caching; I felt it was relevant.)
I wouldn’t see AOT compilation so much as a substitute for caching, it’d rather say it’s orthogonal, since (as you say) the target are mainly non-interactive/non-development use cases. But it’s definitely a thing that people who I introduce to Julia ask about a lot: “If we have the code ready and want to run massive data production, can we compile this to a binary?”. And: “If you implement this algorithm in Julia, can we use it from C++ as a standard dynamic library?”.
Quite right; AOT compilation and caching for better interactive latency are orthogonal issues. But caching is just one possible solution to the latency problem, and I suspect it may not be the best. Caches need to be invalidated, and when that happens you lose all the benefit. I continue to consider the priority to be “do whatever it takes to reduce latency”, not necessarily implementing a caching mechanism.
The seductive thing about caching is that it would allow doing all the fancy optimizations that everyone loves all the time like we do now while still supposedly solving the latency problem. But the best approach to fixing most latency issues is still probably the tried-and-true tiered JIT approach.