It takes several minutes to add a single package

It takes several (6+) minutes to add a single package to my default Julia environment. What’s going wrong?

(@v1.6) pkg> add https://github.com/TakekazuKATO/TailRec.jl
     Cloning git-repo `https://github.com/TakekazuKATO/TailRec.jl`
    Updating git-repo `https://github.com/TakekazuKATO/TailRec.jl`
   Resolving package versions...
   Installed libsndfile_jll ─ v1.0.31+1
   Installed GLFW_jll ─────── v3.3.5+1
  Downloaded artifact: libsndfile
  Downloaded artifact: libsndfile
  Downloaded artifact: GLFW
    Updating `~/.julia/environments/v1.6/Project.toml`
  [f6209947] + TailRec v0.2.0 `https://github.com/TakekazuKATO/TailRec.jl#master`
    Updating `~/.julia/environments/v1.6/Manifest.toml`
  [f6209947] + TailRec v0.2.0 `https://github.com/TakekazuKATO/TailRec.jl#master`
  [0656b61e] ↑ GLFW_jll v3.3.5+0 ⇒ v3.3.5+1
  [5bf562c0] ↑ libsndfile_jll v1.0.31+0 ⇒ v1.0.31+1
Precompiling project...
  ? DocSeeker
  ? TestImages
  ? Atom
  57 dependencies successfully precompiled in 373 seconds (334 already precompiled)
  3 dependencies failed but may be precompilable after restarting julia

also

(@v1.6) pkg> status
      Status `~/.julia/environments/v1.6/Project.toml`
  [c52e3926] Atom v0.12.35
  [6e4b80f9] BenchmarkTools v1.2.0
  [ad839575] Blink v0.12.5
  [35d6a980] ColorSchemes v3.15.0
  [3da002f7] ColorTypes v0.11.0 `~/.julia/dev/ColorTypes`
  [5ae59095] Colors v0.12.8
  [861a8166] Combinatorics v1.0.2
  [717857b8] DSP v0.6.10
  [a93c6f00] DataFrames v1.2.2
  [864edb3b] DataStructures v0.18.10
  [31c24e10] Distributions v0.25.17
  [a5dba43e] DynamicGrids v0.20.1
  [09f84164] HypothesisTests v0.10.4
  [916415d5] Images v0.24.1
  [c8e1da08] IterTools v1.3.0
  [e5e0dc1b] Juno v0.8.4
  [b964fa9f] LaTeXStrings v1.2.1
  [984bce1d] LambertW v0.4.5
  [093fc24a] LightGraphs v1.3.5
  [4f449596] MatrixNetworks v1.0.2
  [cc649173] MiniFB v0.1.1
  [5f95e7b2] NativeSVG v0.1.0 `https://github.com/BenLauwens/NativeSVG.jl.git#master`
  [6fe1bfb0] OffsetArrays v1.10.7
  [d96e819e] Parameters v0.12.3
  [14b8a8f1] PkgTemplates v0.7.20
  [f0f68f2c] PlotlyJS v0.18.8
  [91a5bcdd] Plots v1.22.3
  [80ea8bcb] PortAudio v1.1.2
  [d330b81b] PyPlot v2.10.0
  [e6cf234a] RandomNumbers v1.5.3
  [295af30f] Revise v3.1.20
  [bd7594eb] SampledSignals v2.1.2
  [a2af1166] SortingAlgorithms v1.0.1 `~/.julia/dev/SortingAlgorithms`
  [07e3d4f1] SortingNetworks v0.3.2
  [90137ffa] StaticArrays v1.2.13
  [2913bbd2] StatsBase v0.33.11
  [f3b207a7] StatsPlots v0.14.28
  [f6209947] TailRec v0.2.0 `https://github.com/TakekazuKATO/TailRec.jl#master`
  [5e47fb64] TestImages v1.6.1
  [50d962a5] TriplePendulums v0.1.0 `~/.julia/dev/TriplePendulums`
  [d6d074c3] VideoIO v0.9.4
  [8149f6b0] WAV v1.1.1
  [8bb1440f] DelimitedFiles
  [10745b16] Statistics
  [8dfed614] Test

Perhaps excessive precompile invalidation?

1 Like

Not really answering your question, but I recommend you to get used to not have packages in your global environment (or only some general dev ones if you need, like Revise), and instead create local environments, or even temporary ones with ]activate --temp. Having local/temporary environments might also help with pinning down the culprit (as you’d have fewer packages to look at).

14 Likes

I’ve heard of similar issues in shared storage systems. Is that the case? The package manager of 1.7 (which can be ported to 1.6, not sure exactly how) worked better in that case.

It’s quite clear what happened, no? The updated versions caused a lot of packages having to recompile in the big environment. You can use smaller environment or disable the auto precompilation.

10 Likes

BTW, I just tried to add that package:

(@v1.6) pkg> activate --temp
  Activating new environment at `/tmp/jl_qOxjOa/Project.toml`

(jl_qOxjOa) pkg> add https://github.com/TakekazuKATO/TailRec.jl
     Cloning git-repo `https://github.com/TakekazuKATO/TailRec.jl`
    Updating git-repo `https://github.com/TakekazuKATO/TailRec.jl`
    Updating registry at `~/.julia/registries/General`
    Updating git-repo `https://github.com/JuliaRegistries/General.git`
   Resolving package versions...
    Updating `/tmp/jl_qOxjOa/Project.toml`
  [f6209947] + TailRec v0.2.0 `https://github.com/TakekazuKATO/TailRec.jl#master`
    Updating `/tmp/jl_qOxjOa/Manifest.toml`
  [f6209947] + TailRec v0.2.0 `https://github.com/TakekazuKATO/TailRec.jl#master`
Precompiling project...
  1 dependency successfully precompiled in 2 seconds

Precompilation of TailRec.jl took 2 seconds for me.

Six minutes for 57 dependencies looks like a lot: it really depends on what packages you have there, but I have environments with ~100 packages (but many of them are probably very small) and precompilation takes no more than 2 minutes. Have you disabled parallel precompilation by any chance?

1 Like

TailRec.jl has no dependencies and none of my currently installed packages depend on it. Why do other packages need to recompile?

When I work in python, the only interaction I have with the package manager is the command line pip3 install name, which typically executes in its entirety faster than Julia’s package manager updates the general registry. Having to segregate packages into environments takes more time and effort than could be spent using those packages, while having every package in the global environment means I’m only a few keystrokes away from every package feature I’ve ever installed. There really isn’t a namespace issue the way there is with globally scoped variables, and I have yet to run into a single incompatible dependencies issue in python.

Large environments are mostly a nice-to-have feature, not a need-to-have feature for me, but when I do on projects with many dependencies (or that depend on a package with many dependencies, or that depend on a package with slow precompilation) there’s no working around a large environment, and for some small Julia packages (like TailRec), spurious precompilation time is a substantial overhead.

We could mask this issue by precompiling in the background, but threading, hiding in the background, etc. can’t solve the underlying problem of taking 6 minutes of CPU time to do a task that should take 2 seconds.

4 Likes

I believe you want to use

]add --preserve=all https://github.com/TakekazuKATO/TailRec.jl

to not update all other packages. Maybe one could argue that --preserve=all is a nicer default for add? Incidentally, this is pretty much equivalent to installing the package in a pristine environment, which is what I did above with activate --temp.

Precompiling TailRec does take 2 seconds. You have probably something else in your 391-package environment that is causing that large precompilation time. I’ve never spent 6 minutes watching precompiling my environments since Julia v1.6, but I also don’t have such large environments.

7 Likes

Definitely.

Some packages (CUDA.jl at least) download stuff while “precompiling”. If the internet connection is bad, that can take a while. That was a problem to me some time ago, before a new package server was created closer to where I am.

3 Likes

Yes, I often forget it, and I have large environments (I realize to best practice, but I guess a common beginner “mistake”), and often get large changes to the environment (or get wall of text, and can’t add) though I think never had to wait 6 minutes.

I would argue it’s actually a bug to not use the all option first. Could that change be made (and packported to 1.7)? It seems simple enough to just change the default; even better if add fails to work, offer some alternative:

(@v1.8) pkg> help add
[..]
The following table describes the command line arguments to --preserve (in order of strictness).

  Argument Description                                                                        
  –––––––– –––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
  all      Preserve the state of all existing dependencies (including recursive dependencies) 
  direct   Preserve the state of all existing direct dependencies                             
  semver   Preserve semver-compatible versions of direct dependencies                         
  none     Do not attempt to preserve any version information                                 
  tiered   Use the tier which will preserve the most version information (this is the default)

It seems to me going down the list to none, increases change of success. But tiered is clearly more strict than none, and I’m not sure where it fits in the order, maybe it should be all, direct, tiered, etc.

3 Likes

It’s not uncommon for me to wait several minutes to add a dependency to package. Even if the package only depends on a handful of others, things seem to escalate rather quickly: 20 dependencies drag in a two hundred. And most of them need updating and precompiling.

Quite an interesting exercise to plot the entire dependency graph.

--preserve=all is a good thing to learn about. Default would be even better.

4 Likes

The problem with --preserve=all is then people add a package and get annoyed that it didn’t add the most recent version.

2 Likes

I think an improvement would be to actually display the alternatives. Also warn the user that other packages will be upgraded if that os the case.

1 Like

Yes, that sounds bad… However, it wouldn’t happen with this package (nor any other without dependencies). It also happens to have no versions (not even registered). I’m not sure how you would find the most recent version, but Julia must have that (since tiered goes for that (or maybe already Julia doesn’t provide such a guarantee?!), let’s say it were version 1.1 couldn’t Julia implicitly do:

(@v1.8) pkg> add --preserve=all https://github.com/TakekazuKATO/TailRec.jl#v1.1

and provide an error if not possible (I assume it would already do), or even better, a prompt for tiered, or to next option direct etc.

Even though had already added the package, now testing with:

(@v1.8) pkg> add https://github.com/TakekazuKATO/TailRec.jl#master

I got “25 dependencies successfully precompiled in 24 seconds. 4 already precompiled.” Not too bad, but I believe I didn’t actually update the package, I must have already been at master, and the package hasn’t changed in 8 months, so it seems like a bug (and could have been way slower, same one as installing in the first place).

We could change this bug in 1.7.x, unless going to --preserve=all would be considered a breaking change. Can we then do that right now, and make minor adjustments in 1.7.x?

Is it conceivable to have something like add -q which would ask the user what to do? Install a locally available version, install the latest version (warning about the upgrade of other packages), install the latest version compatible with the current state, etc?

The same goes for up.

I would like to have a much more “stable” environment. It Is not that common to need the latest version of all packages, and in any case not being surprised by upgrades when in the middle of a heavy workflow would be nice. (This is one of the many advantages of Linux over other OSs…)

2 Likes

I proceeded to benchmark the @tailrec macro, and I already have BenchmarkTools in my global environment, but don’t have it in a temporary environment.

Environments stack, so if BenchmarkTools is in your global environment and then you activate a temporary one on top of it, you have still access to BenchmarkTools (i.e., using BenchmarkTools will work). That’s what I do all the time

3 Likes

Exactly. So anything I expect to use often, I install in my global environment. For me this is things like CSV, DataFrames, DataFramesMeta, StatsPlots, Distributions, Turing, MCMCChains, GLM, Optim, and a few others. In fact I even compile a sysimage with those so it’s super fast to using them.

Thanks for letting me know!
I’d still have the problem that using tailrec and @tailrec would not be available elsewhere until I add tailrec again, though. Ideally I’d like to always have everything accessible without having to deal with unexpected ]updates.

At first having to install your packages in every environment where you need it may sound as an unnecessary burden but it is not that much of a burden and it is beneficial. Particularly if you use your code for work and at some point you need to share your code with someone, having a reproducible, unnecessarily cluttered environment is a total win.

I do think that the ] update package command could have better default behavior.

1 Like

Note: This is still not a great idea, but to be fair an alternative isn’t easy until 1.7 is out.

In the future, you can have named projects, described here

So if you wanted a bunch of data packages easily accessible, didn’t want them in your global environment, and didn’t want to make a system image, you can do

julia --project=@data

where data is in your environments folder somewhere.