Tell that to packages that are not able to interact with CategoricalArrays without direct support from it.
But in what cases does loading CategoricalArrays take 4 seconds? Most of the dependencies used with Requires are interface packages and therefore fairly small.
I just tried it again, and I exaggerated with 4 seconds, it is 2.6 or so. Which still strikes me as way too long, though. On my system I get that with @time_imports using MimiGIVE (that is a public package out of my lab group):
2641.1 ms CategoricalArrays 93.02% compilation time (92% recompilation)
If I just remove the entire __init__() function from CategoricalArrays, the load time for it drops to about 200ms:
197.3 ms CategoricalArrays
The only thing inside the __init__() function is the Requires stuff.
I’m not a real expert on this, but my conclusion from reading around a bit was that Requires probably causes more pain in terms of load times than it solves…
The solution to this is probably to try to slim down all the deps that are currently handled via Requires enough so that they can become normal, always deps? I also experimented with that a bit, so with a version of CategoricalArrays that has JSON, RecipesBase, SentinelArrays and StructTypes as regular depdencies, I get:
50.5 ms JSON
121.5 ms RecipesBase
348.6 ms SentinelArrays 31.04% compilation time
20.2 ms StructTypes
168.1 ms CategoricalArrays
My best guess is that one could fix something about the load times for SentinalArrays (UPDATE: this helps a bit), and maybe JSON.lower should be moved into a tiny package JSONBase that just defines JSONBase.lower? And maybe at that point CategoricalArrays could just have regular dependencies on all of these and ditch Requires? Don’t know what to do about RecipesBase, that seems (surprisingly) slow to load for an interface package.
That’s not a good reason because this requires usage makes the functions of the package in the glue part not amenable to precompilation. Those small interface functions are exactly the kinds of functions that show up in a lot of places downstream and will this wait until runtime to compile (and invalidate any downstream usage of the interface functions that could possibly hit the dispatch). Possibly hit the dispatch meaning anything sufficiently not inferred as well. Since everything in DataFrames comes out of a Vector{Any} with function barriers, that’s exactly the kind of code that would drop precompilation with this. Requires.jl is really only safe for large dependency obscure features, not for interface functions.
The basic emerging strategy is to break up the package into subpackages along dependency lines.
A CategoricalArraysCore package that is minimalist. Just declare your types and some constructors here. Other packages that want to build incorporate some compatibility with your types can depend on this. Ideally, the CategoricalArraysCore package has minimal dependencies.
CategoricalArrays package depends on CategoricalArraysCore that provides the full interface.
CategoricalArraysJSON package that loads both CategoricalArrays and JSON
CategoricalArraysRecipesBase package that loads both CategoricalArrays and RecipesBase
CategoricalSentinelArrays package that loads both CategoricalArrays and SentinelArrays
CategoricalArraysStructTypes package that loads both CategoricalArrays and StructTypes
The emerging convention is to be put these into a subdirectory package within a libs folder.
As part of the transition, you could use Requires.jl to just load the glue packages, but eventually you would want to transition direct dependents to either using a selection of the above. Library packages would mainly load CategoricalArraysCore. Terminal applications packages may load CategoricalArrays. End user projects may want to use CategoricalArraysFull for convenience.
This is all quite new and perhaps initially tedious, but eventually I think it will lead to a more robust ecosystem than we have today.
Having a separate package for each current use of Requires won’t work AFAICT. If a package loads CategoricalArraysCore/CategoricalArrays and returns a CategoricalArray, and then the user writes that to a .json file using JSON.jl, the result won’t be correct unless CategoricalArraysJSON is loaded. Same for plotting or sentinel arrays. Depending on the case, you’d get errors or plainly misleading behavior. So in the end CategoricalArrays would be useless.
What could work is having a CategoricalArraysCore interface package that JSON, StructTypes, RecipesBase and SentinelArrays depend on. But I’ll leave you convince their maintainers to take an additional dependency… especially given that these are interface packages supposed to be lightweight.
Just one more observation here: if I remove the precompile statements from SentinelArrays (and use the PR I made that removes the package specific RNGs), load times drop to something like 20ms. Maybe there is another model to 1) move everything from SentinelArrays into SentinelArraysCore, except the precompile statements, 2) have SentinelArrays depend on SentinelArraysCore and then essentially only add the precompile statements. CategoricalArrays could then depend directly on SentinelArraysCore and that would add so little overhead that all might be well? It does seem weird and hacky, but maybe it could get the job done? I generally feel that if a small interface package loads really fast (like StructTypes), then it is probably fine to just take a direct dep on that.
A year ago, I would have also recommended using Requires.jl. Given what we know now, I think it’s time to transition away from Requires.jl.
The transition is to move the glue code into their own glue packages, but still try to import them via Requires.jl while issuing warnings to explicitly load the glue package. Eventually, we stop loading the glue package packages automatically. Maybe we’ll have improved optional dependency support by then.
Ultimately, the goal is make code loading intentional and explicit.
Thanks for the v0.10.7 update by the way. Loading OpenSpecFun_jll dropped from 487 ms to 78 ms on Julia 1.8.1 for me. There seems to be a regression on Julia master (1.9) at the moment though.
Julia 1.8.1 with CategoricalArrays v0.10.6 (486.9 ms to load OpenSpecFun_jll)
julia> @time_imports using CategoricalArrays, OpenSpecFun_jll
4.6 ms DataAPI
31.1 ms Missings
0.9 ms Requires
207.1 ms CategoricalArrays 14.75% compilation time (6% recompilation)
26.4 ms Preferences
0.5 ms JLLWrappers
0.4 ms CompilerSupportLibraries_jll
486.9 ms OpenSpecFun_jll 99.85% compilation time (99% recompilation)
Julia 1.8.1 with CategoricalArrays v0.10.7 (77.5 ms to load OpenSpecFun_jll)
julia> @time_imports using CategoricalArrays, OpenSpecFun_jll
4.7 ms DataAPI
31.0 ms Missings
0.9 ms Requires
205.3 ms CategoricalArrays 14.54% compilation time (6% recompilation)
28.5 ms Preferences
0.4 ms JLLWrappers
0.4 ms CompilerSupportLibraries_jll
77.5 ms OpenSpecFun_jll 99.06% compilation time (96% recompilation)
julia> using Pkg
julia> pkg"st CategoricalArrays"
Status `~/.julia/environments/ross/Project.toml`
[324d7699] CategoricalArrays v0.10.7
Julia 1.9 with CategorialArrays v0.10.7 (388.0 ms to load OpenSpecFun_jll)
julia> @time_imports using CategoricalArrays, OpenSpecFun_jll
3.8 ms DataAPI
33.0 ms Missings
0.9 ms Requires
2.7 ms Statistics
207.6 ms CategoricalArrays 13.42% compilation time (10% recompilation)
38.7 ms Preferences
0.5 ms JLLWrappers
388.0 ms OpenSpecFun_jll 99.79% compilation time (99% recompilation)
julia> versioninfo()
Julia Version 1.9.0-DEV.1433
Commit e6d99792e6* (2022-09-24 14:32 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
I’m not sure if I completely understand the proposal and its merits.
My sense is that we probably should make the execution of precompile statements contingent on some condition. SnoopPrecompile already has implemented this mechanism:
jl_generating_output essentially indicates if we are in the process of generating native code and caching it to disk. If we are not doing that, then there is no point in the precompilation statements.
Fair enough Here is a bit more what I was thinking:
If I compare the load time of this branch with one where I simply remove these two lines, I get about 200ms vs 20ms. In both cases I’m timing a scenario where everything is already precompiled. The first scenario is one where a lot of precompile info is stored, the second one where none is stored.My sense is that SentinelArrays with precompile statements is already guarding properly via the jl_generating_output construction that you mention.
So that suggests that simply loading the precompiled data increases the load time of SentinelArrays by an order of magnitude relative to a scenario where no precompile data is stored. To me this seems to be about load time of precompile data that exists.
So now we are in a pickle: if I just want to use CategoricalArrays (and not SentinelArrays), then I clearly don’t want to have these 180ms of extra load time of precompile information that I’m not going to need. But of course if I actually want to utilize SentinelArrays, then that precompile data is great to have and I probably do want it to be loaded, even if that increases package load times.
So the idea would be that SentinelArraysCore has all the code but zero precompile statements. CategoricalArrays takes a dependency on that, and that will only increase the load time by 20ms. Great, that is not a big price to pay for CategoricalArrays, and if a user just wants to use CategoricalArrays and not SentinelArrays, all is good. If someone plans to actually utilize SentinelArrays, then they load that package, and that will now trigger loading of all the precompiled data because the precompile statements are all in SentinelArrays. So that will be slower than just loading SentinelArraysCore, but presumably that is OK if someone is loading SentinelArrays because they actually intend to use the package, not just integrate with the interface there.
So, this is not about invalidations at all. The problem we seem to have here (after the invalidation and recompile stuff is fixed) is that some packages that have a good set of precompile statements will load slower because of that extra precompile data, and there are scenarios where that extra precompile data is actually not needed at all because all we want to do is integrate with some interface, not necessarily actually run stuff from that package.
Re Requires.jl, the issue is not Requires itself (that mechanism is fine), the issue is that the way most of us use it causes the code not to be cached. But you could simply define a sub-package and make the @require @eval using MyRequiresSubPackage and then it would be fully compatible with precompilation.
Measuring load time without measuring TTFX is only half the story. Sure, if load time increases from 20ms to 200ms, that’s bad; but if removing the precompiles causes TTFX to go from 2s to 20s, that’s much worse.
Oh, and one more random comment: I’m not a huge fan of this I’ve come across this a couple of times now, and it was way, way more difficult to find than just following the standard conventions of having one repo per package. There is also a fair bit of tooling that doesn’t know about this convention (e.g. the VS Code extension). And I also don’t understand how tagging for example works? In general, for choices like this I think there is huge value in just doing it the “normal” way that most folks are familiar with.
Not sure anyone will follow up on this thread, but if, it might make sense to split this into its own conversation, really not related to the main point here.
Yes, but the problem we are grappling with here is that we have packages where in one scenario the right trade-off is to pay the extra load time caused by a lot of precompiled data to have the lower TTFX, and in another (equally legitimate) scenario that becomes a real problem because for example CategoricalArrays only loads SentinelArrays to make sure things are integrated, but in this scenario no one will ever use any functionality from SentinelArrays. So then one pays the extra load time from the precompile data load, but never gets anything back in terms of TTFX because the functionality that is being precompiled never actually runs.
So that sounds interesting! I guess in that case the idea would be that we could define four new packages (one for each of the @require integration points that CategoricalArrays currently has), put all the code that is currently in these @require blocks into these packages, and then only load these new integration packages in the @require clause? If that would solve the problem, it might be the least involved to get going?
So I’m understanding this correctly, does the proposed @require @eval using ... pattern essentially give us the lazy glue code loading Requires is currently used for while not incurring most of the drawbacks of using Requires? Or is it providing something else/something far less powerful? If the former, that sounds like an amazing stopgap until we get proper conditional deps.
A pure “Core” module to me only has types and no methods. In practice, you probably need some constructors and methods for the “Core” module to have any utility.
In one sense I agree with you that there should be few precompile statements in a “Core” package but that is mostly because there should not be much to precompile there to begin with.
Yes, that is essentially what I was proposing above.
It provides an opportunity to do precompilation caching of the glue code. One has to make sure that the interface package does indeed include precompile statements. As far as Julia is concerned the interface package is just a normal package.
I guess as a short-term measure we could at least turn RecipesBase and StructTypes into plain dependencies.
For SentinelArrays, I wonder whether we could find a solution by adding an API to ArrayInterfaceCore. Indeed, the problem to solve is just that CategoricalArrays defines a copyto! method dispatching on the destination array, while SentinelArrays defines one on the source array. But all that’s needed is to call the general fallback defined in SentinelArrays, there’s no particular interaction between these two packages. If we could add an API to indicate that SentinelArrays should take precedence all would work fine.
Regarding JSON.jl, the only line behind @requires is JSON.lower(x::CategoricalValue) = JSON.lower(unwrap(x)). I don’t understand how it can causes invalidations. Care to explain?
I’m not sure if the issue is that it’s causing invalidations. Rather there is no ability to cache the type inferred code anywhere for that method, potentially lengthening load times.
How does this look in practice? Can packages keep their @require PkgName=UUID using GluePkg statements at the top level, or does this require changes to code using Requires?
The @require statements have to go in __init__, as usual. But with @eval I don’t think there’s any barrier to doing this today, without any need for changes to Requires.jl. I’ve generally not bothered simply because when I’ve used Requires it’s mostly been to load a small amount of code, but there are cases where the precompilation would be very desirable.