Not really answering your question, but I recommend you to get used to not have packages in your global environment (or only some general dev ones if you need, like Revise), and instead create local environments, or even temporary ones with ]activate --temp. Having local/temporary environments might also help with pinning down the culprit (as you’d have fewer packages to look at).
I’ve heard of similar issues in shared storage systems. Is that the case? The package manager of 1.7 (which can be ported to 1.6, not sure exactly how) worked better in that case.
It’s quite clear what happened, no? The updated versions caused a lot of packages having to recompile in the big environment. You can use smaller environment or disable the auto precompilation.
Precompilation of TailRec.jl took 2 seconds for me.
Six minutes for 57 dependencies looks like a lot: it really depends on what packages you have there, but I have environments with ~100 packages (but many of them are probably very small) and precompilation takes no more than 2 minutes. Have you disabled parallel precompilation by any chance?
TailRec.jl has no dependencies and none of my currently installed packages depend on it. Why do other packages need to recompile?
When I work in python, the only interaction I have with the package manager is the command line pip3 install name, which typically executes in its entirety faster than Julia’s package manager updates the general registry. Having to segregate packages into environments takes more time and effort than could be spent using those packages, while having every package in the global environment means I’m only a few keystrokes away from every package feature I’ve ever installed. There really isn’t a namespace issue the way there is with globally scoped variables, and I have yet to run into a single incompatible dependencies issue in python.
Large environments are mostly a nice-to-have feature, not a need-to-have feature for me, but when I do on projects with many dependencies (or that depend on a package with many dependencies, or that depend on a package with slow precompilation) there’s no working around a large environment, and for some small Julia packages (like TailRec), spurious precompilation time is a substantial overhead.
We could mask this issue by precompiling in the background, but threading, hiding in the background, etc. can’t solve the underlying problem of taking 6 minutes of CPU time to do a task that should take 2 seconds.
to not update all other packages. Maybe one could argue that --preserve=all is a nicer default for add? Incidentally, this is pretty much equivalent to installing the package in a pristine environment, which is what I did above with activate --temp.
Precompiling TailRecdoes take 2 seconds. You have probably something else in your 391-package environment that is causing that large precompilation time. I’ve never spent 6 minutes watching precompiling my environments since Julia v1.6, but I also don’t have such large environments.
Some packages (CUDA.jl at least) download stuff while “precompiling”. If the internet connection is bad, that can take a while. That was a problem to me some time ago, before a new package server was created closer to where I am.
Yes, I often forget it, and I have large environments (I realize to best practice, but I guess a common beginner “mistake”), and often get large changes to the environment (or get wall of text, and can’t add) though I think never had to wait 6 minutes.
I would argue it’s actually a bug to not use the all option first. Could that change be made (and packported to 1.7)? It seems simple enough to just change the default; even better if add fails to work, offer some alternative:
(@v1.8) pkg> help add
[..]
The following table describes the command line arguments to --preserve (in order of strictness).
Argument Description
–––––––– –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
all Preserve the state of all existing dependencies (including recursive dependencies)
direct Preserve the state of all existing direct dependencies
semver Preserve semver-compatible versions of direct dependencies
none Do not attempt to preserve any version information
tiered Use the tier which will preserve the most version information (this is the default)
It seems to me going down the list to none, increases change of success. But tiered is clearly more strict than none, and I’m not sure where it fits in the order, maybe it should be all, direct, tiered, etc.
It’s not uncommon for me to wait several minutes to add a dependency to package. Even if the package only depends on a handful of others, things seem to escalate rather quickly: 20 dependencies drag in a two hundred. And most of them need updating and precompiling.
Quite an interesting exercise to plot the entire dependency graph.
--preserve=all is a good thing to learn about. Default would be even better.
Yes, that sounds bad… However, it wouldn’t happen with this package (nor any other without dependencies). It also happens to have no versions (not even registered). I’m not sure how you would find the most recent version, but Julia must have that (since tiered goes for that (or maybe already Julia doesn’t provide such a guarantee?!), let’s say it were version 1.1 couldn’t Julia implicitly do:
I got “25 dependencies successfully precompiled in 24 seconds. 4 already precompiled.” Not too bad, but I believe I didn’t actually update the package, I must have already been at master, and the package hasn’t changed in 8 months, so it seems like a bug (and could have been way slower, same one as installing in the first place).
We could change this bug in 1.7.x, unless going to --preserve=all would be considered a breaking change. Can we then do that right now, and make minor adjustments in 1.7.x?
Is it conceivable to have something like add -q which would ask the user what to do? Install a locally available version, install the latest version (warning about the upgrade of other packages), install the latest version compatible with the current state, etc?
The same goes for up.
I would like to have a much more “stable” environment. It Is not that common to need the latest version of all packages, and in any case not being surprised by upgrades when in the middle of a heavy workflow would be nice. (This is one of the many advantages of Linux over other OSs…)
Environments stack, so if BenchmarkTools is in your global environment and then you activate a temporary one on top of it, you have still access to BenchmarkTools (i.e., using BenchmarkTools will work). That’s what I do all the time
Exactly. So anything I expect to use often, I install in my global environment. For me this is things like CSV, DataFrames, DataFramesMeta, StatsPlots, Distributions, Turing, MCMCChains, GLM, Optim, and a few others. In fact I even compile a sysimage with those so it’s super fast to using them.
Thanks for letting me know!
I’d still have the problem that using tailrec and @tailrec would not be available elsewhere until I add tailrec again, though. Ideally I’d like to always have everything accessible without having to deal with unexpected ]updates.
At first having to install your packages in every environment where you need it may sound as an unnecessary burden but it is not that much of a burden and it is beneficial. Particularly if you use your code for work and at some point you need to share your code with someone, having a reproducible, unnecessarily cluttered environment is a total win.
I do think that the ] update package command could have better default behavior.
Note: This is still not a great idea, but to be fair an alternative isn’t easy until 1.7 is out.
In the future, you can have named projects, described here
So if you wanted a bunch of data packages easily accessible, didn’t want them in your global environment, and didn’t want to make a system image, you can do
julia --project=@data
where data is in your environments folder somewhere.