Regarding OSCAR specifically, we, the OSCAR team (include me specifically, as one of the leads) already invested quite some time looking into the overhead for loading it, and how to reduce it. Unfortunately I am not sure any of us visit the Julia Discourse regularly, I certainly don’t, and I only stumbled over this thread by pure chance. Feel free to also talk to us about it on our Slack or on our GitHub Discussions…
That said, I can already now tell you that PrecompilationTools or anything like that won’t help much here, or else we’d already be doing it (well, it may help some and we have it in our agenda, to use it more extensively, but experiments suggest it’s not a panacea). It may eventually help, but there are many roadblocks.
For example, we suffer massively from method invalidations, e.g. caused by CxxWrap.jl (and surely also some of our own making). We’ve improved a bunch of things, including sending out patches to a bunch of our own dependencies; but it feels like an uphill battle, as we see here: ParameterEstimation
has a bazillion dependencies, and some of these seem to exacerbate the problem as we’ll see below…
The whole thing unfortunately is complicated, and complicated to debug (any substantial help here is certainly welcome), And there are some factors involved that can’t be easily improved if at all. For example, GAP.jl (the one I am most familiar with as one of the authors and lead dev on the GAP computer algebra system it wraps) loads and parses a ton of source files in the GAP language – that means disk performance matters a lot. From your numbers I am guessing you are either on a HDD or on rather slow SSD – that’s why it loads much faster for me. To be clear: I am not suggesting here you should “just get a faster computer”, rather I want to try to paint the broad picture of what causes what.
To qualify this, here are the timing I get for @time_imports using ParameterEstimation
which uses Oscar 0.11.3 (an outdated version, but it’s the one pulled in by ParameterEstimation
so I used this for all timings to have a fair comparison) on an M1 MacBook Pro with Julia 1.9.3, these are the numbers I get:
1192.4 ms CxxWrap 3.49% compilation time (37% recompilation)
4576.8 ms Polymake 21.68% compilation time (89% recompilation)
2031.9 ms GAP 14.59% compilation time (76% recompilation)
2821.9 ms Hecke 48.95% compilation time (86% recompilation)
1736.3 ms Oscar 59.09% compilation time (44% recompilation)
2.2 ms ParameterEstimation
So that’s quite a lot faster than what you report, although it’s still not great.
But let me now contrast this with @time_imports using Oscar
(using the exact same Oscar version)
380.3 ms CxxWrap 6.01% compilation time
2334.8 ms Polymake 3.34% compilation time (21% recompilation)
1471.7 ms GAP 19.51% compilation time (81% recompilation)
1886.4 ms Hecke 47.97% compilation time (80% recompilation)
1171.1 ms Oscar 51.77% compilation time
So things are quite a bit faster, and there is a LOT less recompilation. That’s not a fluke, I can easily reproduce it (of course with some fluctuation, but the rough ballpark stays).
Let’s just focus on @time_imports using GAP
and we get
1180.0 ms GAP 4.91% compilation time
and easily >95% of that time is spent in the GAP kernel parsing and executing GAP code, so there is nothing here really to improve from the Julia side (maybe from the GAP side, but that’s way out of scope here).
As you can see, loading GAP varies from 1180.0 to 1471.7 to 2031.9 milliseconds in the three examples!
So what happens? I don’t claim to have the full answer for this, but I think at least part of this is method invalidation. One clue is the recompilation time. We can also reproduce this by doing e.g. @time_imports using CxxWrap, GAP
:
360.4 ms CxxWrap 6.98% compilation time
1326.4 ms GAP 15.61% compilation time (72% recompilation)
I’ve submitted PRs to CxxWrap in the past to improve this (and it got better), but there is still more to be done but I am afraid I can’t do it (I think it may require changing the CxxWrap API to turn a bunch of “automatic” / implicit conversions in it into explicit conversions, but to a degree I am guessing ehre). Anyway, I’ve opened an issue about it Understanding and reducing invalidation caused by CxxWrap · Issue #278 · JuliaInterop/CxxWrap.jl · GitHub two years ago, if anyone would like to help… That would at least help anyone just doing using Oscar
…
But in your context of course there is much more slowdown, presumably to the many, many additional packages requiring even more recompilations. But I have not studied this in detail, so there may well be other factors involved I am not aware of myself.