Right now precompilation is run on users’ computers after they add a package. Sometimes these can take a very long time (e.g. Makie.jl or DifferentialEquations.jl). Sure, you only have to do it once, but then you have to do it again every time there is an update. And whether you like it or not, it will make new users immediately think “Wow, it takes seconds to conda install something, Julia is comparatively so slow”. Also thinking about this recent post.
In my view, such frustrations are completely understandable and a clear barrier.
All this in mind, are there are any plans for the registry to store the precompiled binaries for different architectures (Python wheels-esque)? I saw some attempts to resolve this (Jumbo ) which is promising, but shouldn’t such tools be a “standard”?
I don’t think it has to be all or nothing. Just the LTS and latest stable version would satisfy most people. Even if I could do it myself locally in some Docker (like BinaryBuilder.jl tools for C libraries) and then just push those binaries to the registry with my package in some standard way, that could significantly improve my user experience
I don’t really get it myself so someone should correct me, but I was under the impression that independently AOT-compiled units like shared libraries can’t be optimized together, which is very different from how a Julia session optimizes within a world of all loaded modules, a type of whole-program optimization like AOT compilers’ link-time optimization from intermediate code. Our locally precompiled package images are thus highly dependent on what else is in our environment; we see this effect when installing more packages triggers precompilation of prior package images.
Therefore, there is no such thing as a LTS or latest stable precompilation binary because it does not only depend on the package’s version, but the versions of all its dependencies allowed within one of countless environments. It is hypothetically possible to distribute version-wise intermediate code prior to any whole-program optimization to save some time, but we would still need to locally optimize and compile the rest of the way, and it might not be meaningfully different from just distributing the source.
I would suggest making seperate enviroments for specific workflow, julia wont precompile without a change or update. So breaking up your main pkg enviroment to smaller ones can help with that. You can do that with julia > activate @Whatever and keep llvm type pkgs for example there and so on. I actually prefer the slow start and precompiling, it lets me know julia is checking and tracking it all, then works really fast, because to me the least time i need anything is during startup lol.
Precompiled Binaries (like LTS/Stable) are not feasible: Distributing a “pre-built binary” for a Julia package is impractical because it would only work for one specific combination of all its dependencies. One update to an API invalidates it. Julia is like a rolling release. LTS, can work for your enviroment with sysimage, juliac, pkg compiler. Julia is too dynamic. You want static enviroments or dynamic.
It’d be nice if someone who has worked on or otherwise had to deal with Pkg on a deeper level than routine usage chimes in, yeah.
I presume that’s why Jumbo isn’t installed as a package, but a project. I don’t know how the stdlib path works but it sounds like a deliberate restriction of changes to the dependencies, something we wouldn’t expect if we had add-ed the packages even with strict versions to an environment ourselves.
Conclusion: Pre-built binaries via the registry, in theory, could be possible, but implementing this proposal would require significant effort.
For now, [ANN] Jumbo - a scientific Julia distribution is the way to go if you want to avoid long pre-compilation times using a pre-defined set of packages.
Shouldn’t we compile each package independently? Yes, less inlinining across packages, i.e. no whole-program optimization, means longer runtimes. But it could be the default, then you can opt into current behavior. One problem is packages in other languages are larger, say numpy, and might split across more in Julia (in this case not a problem, equivalent in Base). Maybe, maybe not, we can compete with “Python” packages, i.e. implemented in C, this way. At least much better at installation…
Extreme separate compilation is basically useless, especially in a performance-minded language that’s mostly generics. We don’t just lose inlining, we lose specialization (approximately what other languages call monomorphization). For example, when package A precompiles a A.foo(::B.Bar) call for a method A.foo(x), where B is a package dependency, that precompile cache of A depends on the exact version of B. Nobody wants the much slower alternative of compiling an unspecialized A.foo(::Any); interpreting would be a more effective elimination of compilation latency. Even a language like Swift that leans toward extreme separate compilation to the point of stabilizing its own ABI still monomorphizes explicitly labeled generics across modules for performance-critical things like arrays and complex numbers.
The “problem” of compiling larger, less composable monoliths (NumPy is actually relatively small) also increases the whole-program optimization within a compilation unit, so users do get great performance in shared libraries without touching a compiler. There’s no way around this tradeoff. If we want composability, full language features, and reasonable optimizations, the compiler needs to consider the whole program in the same environment. If we can settle for the relatively limited C ABI, then we can distribute and dynamically link/load binaries compiled in slightly different environments. Julia has both options, but lugging around a separate Julia process for each shared library wasn’t practical, hence the development of JuliaC’s conditional trimming.
Extreme separate compilation would be slower yes, as in inline nothing. What I proposed was a middle ground. You would still have all the inlining opportunities within a package, not just to its dependencies (Julia Base an exception?). Also I’m not saying this would be faster at runtime, it would be slower, but a better first use experience?
A lot of code isn’t speed critical at all, and needs not be inlined, see e.g. Jeff’s example:
Would that for sure happen, and also would be ok in many cases since not speed-critical? If you precompile your dependencies, in the best possible way, they would be compiled for common types like Float64.
I’m just thinking, this likely wouldn’t be slower than Python, or other interpreted languages… and we could improve on this later, also this would never rule out opting into whole-program compilation, it just wouldn’t be the default (some HPC envirmonment might want that and opt into current behavior by default=.
It would be far slower. Despite what every clickbait demo of slow for-loops over arrays says, interpretation is mostly a red herring. Practical Python spends the vast majority of the runtime executing optimized compiled units from other languages, and each compiled unit composes enough things to be useful for most use cases. A typical Julia package does not compose, it’s a component; a parametric array type, higher order array functions, various scalar types, and scalar operations could easily be split across several completely independent packages to be compiled together just-in-time, whereas NumPy or its extensions (capabilities of which are improving) must compile multiple aspects together ahead of time.
A derivative Julia package that imports exact versions of packages and precompiles a composite workload would be more worth pre-“building”, but it’d be more limited than a typical Python package with binary compatibility. For example, a SciPy 1.10 binary compiled against NumPy 1.17 can work with NumPy 1.25 binaries (not SemVer despite how it looks), but such a derivative Julia package clashes with any other package, possibly another derivative package, that needs to update a shared dependency. Jumbo is a similar and recent approach to distributing an environment with locked versions and downloadable precompile caches, and it obviously doesn’t expect to mix with other similar environments. To mix Julia binaries from different environments, we’d need PackageCompiler or JuliaC for the C ABI, and this hypothetical Julia-in-Julia practice is questionable.
Separate compilation already optimizes as much as possible within each unit, and compilers already have heuristics for when optimizations like inlining are worth doing or not. This is not new and does not address the core issue of how units are split.