It takes several minutes to add a single package

This doesn’t do what I want either though, because what I want is to have a project for my particular data analysis task, but within that project it has all the “infrastructure” one often needs directly available. Now if I could say something like --project=. --project=@data --project=@optimization --project=@database and have the current project be the . directory but the various other projects are “stacked” on that, it might make sense. (BTW: I really like having a built sysimage with all those things, so for now that’s my main solution)

Merging projects would indeed be cool.

Mods feel free to split off this digression.

1 Like

Actually it is pretty much of a burden particularly in the way that temporary environments are becoming central to testing and experimental workflow. These temporary environments are useful to avoid cluttering the main one, but adding packages to them is not instantaneous for packages available locally. Start playing in a temporary environment in which I added a bunch of packages to then have do restart everything for some reason is quite frustrating*. I cannot find an ideal solution (which I think should pass by the possibility of "free add"ING packages that are already available and precompiled locally).

Also while we keep suggesting people to not clutter the main environment, the truth is that it is special in the way that it interacts with all other environments. It would be better if we could just have fine control of what happens with it, and allow it to be cluttered.

The whole environment stuff is a powerful and useful feature. But being obligated to use it is not.

*Btw: is it possible to name and then save a temporary environment?

1 Like

Good points.

I’ve experienced the ]test thing especially when someone offers a MWE for a problem that looks kind of easy. Then I try adding the package into a temp environment and I get turned off after the first few minutes of package installation and precompilation.

2 Likes

While I agree, I think maintaining minimum necessary environments for a project should be opt-in rather than mandatory. For example, most of the time I am developing on latest versions so they can ]add anything they don’t have and it will all work. And if I want to ensure the best possible experience for them, I should be able to lazily explicitly create a new completely empty environment and manually populate it according to missing package errors only when I need to.

I agree. I think this is a central point here.

Here’s a potential workflow:

Remove “@v#.#” from Base.LOAD_PATH in startup.jl

Typically always work in the global environment and add packages with ]add --preserve=all package. Update with ]up when I run into a package bug, want a new feature, or have a few minutes where it is okay to let Julia be unresponsive and am okay with a risk of having to deal with broken updates.

When I want to share a minimum working environment, including whenever I do package development, make a new temporary or package environment, run my code, and ]add anything that I need (possibly automate this process with the Pkg API).

Advantages:

  1. I never have to worry about environments unless it is necessary.
  2. Julia shouldn’t spend several minutes precompiling when I don’t expect it to
  3. I typically only have to ]add anything when I start using a package for the first time ever (or change Julia versions?)
  4. When I make a temporary environment it is totally clean so I can share the Project.toml without worrying about accidentally depending on something in my global environment

Disadvantages:

  1. Large global environment (I don’t see why this is a problem)
  2. There is an additional step to create a reproducible environment when I want to share a MWE (note that in the classic workflow I would already have manually created this environment and manually ]added everything to it as I went along)
  3. less explicitly aware of which packages I’m using where for my personal code

What do y’all think? What advantages and disadvantages am I missing? Why isn’t this the default behavior of LOAD_PATH and ]add?

EDIT to clarify recommended default behavior of ]add Xxx in non-trivial cases per @lmiq @dlakelan & @kristoffer.carlsson’s comments:

When possible, ]add Xxx should add the latest version of Xxx without updating anything else
Where this is not possible, but it is possible to install the latest version of Xxx with updates to other packages it should prompt:
"Installing package Xxx requires updating N packages. Expected runtime 15s. Type D for details. Continue? (Y/N)”
and D should yield a menu giving the user options to install…

  1. Xxx v0.21.6 which is 12 days out of date and the latest version compatible with currently installed dependency versions.
  2. latest version of Xxx v0.22.0 and update N packages. Expected runtime 15s
  3. latest version of Xxx v0.22.0 and update all M out-of-date packages. Expected runtime 373s

Where it is impossible to install the latest version of Xxx due to dependency conflict, the user should immediately be given detailed options:

The latest version (v0.22.0) of Xxx is incompatible with Yyy (last used 7 years ago) and Zzz (last used yesterday). You may install…

  1. Xxx v0.21.6 which is 12 days out of date and the latest version compatible with currently installed dependency versions.
  2. Uninstall Xxx & Yyy and install the latest version of Xxx v0.22.0, updating N packages. Expected runtime 15s.
  3. Uninstall Xxx & Yyy and install the latest version of Xxx v0.22.0, updating all M out-of-date packages. Expected runtime 373s.
  4. Xxx v0.21.8 which is 2 days out of date and the latest version compatible with Yyy and Zzz. This requires updating N packages. Expected runtime 15s.
  5. Xxx v0.21.8 and update all M out-of-date packages. Expected runtime 373s.
  6. Install Xxx in a new environment…
  7. Copy the current environment to a new location and replace Yyy and Zzz with Xxx here…

These menus, especially the second, might be best refactored into multiple independent choices (e.g. factor our weather to update everything, or remove that option altogether), though I think presenting every possible choice is nice to give a user a big-picture and results-oriented understanding of their choices, with some indication of why their preferred option (have the latest Xxx, Yyy, and Zzz installed now without a long precompilation time) is not available.

1 Like

One point is that a new package, or a newer version of one, may require the update of a dependency, and that may trigger the update and compilation of a whole list of packages. I think that is aceptable, but should be a opt in. “Installing package Xxx requires updating N packages. Type D for detail’s. Continue? (Y/N)”.

Better yet if it was possible to suggest the latest version that could be installed without changing the current package state.

1 Like

My impression is that the biggest problem I’ve encountered is that some package you used once a while ago to see if it would work but you actually didn’t find helpful can hold back basically everything. For example I had that happen with Turing, where I was on a full year old version because… I don’t know some package I totally didn’t care about was incompatible with newer versions of a package Turing needed.

5 Likes

In 1.8 (you can try the nightly) there is an --outdated flag that can be passed to status which give information about packages not being on the latest version and what is holding them back.

13 Likes

This takes the possibilities to another level. Having a tool to inspect the installed packages, report which ones are not used (list them by last use or precompilation, probably easier), could allow easier control of what is going on. In short, an interactive package manager with the features of something like Synaptic would be great.

2 Likes

This is, like, false. If you’d get the same version of the package there is no difference in which environment the package is installed: the default environment is simply the default, it isn’t more special than the others so you have faster installation processes.

1 Like

I didn’t say that that is different from the main environment. The point I wanted to stress in that phrase was twofold: 1) even if the package is available locally, the package manager will check the register (you may find this irrelevant, but I will record what that means next time I go to work at my parents’ house); 2) it may trigger the installation of a new version and possibly the update of many things not related to the package.

The difference from the main environment there is that if the package is installed in the main environment I can use it from other environments avoiding all that. And if the package is available in any other environment I can’t (AFAIK).

I really don’t see what’s the difference between the global environment and a local environment here, but Pkg offline mode is a thing if you don’t want to spend time on updating the registry on a slow connection and you know the package is available in your depot (and I’m also on slow connection occasionally, I know what that means).

Environments still stack

How is that? Actually that can be very useful if I can only activate the “online” mode when I need to.

I don’t really know what that means. How this relates to the fact that I can’t “use” a package from an environment which is not the main? Is there a way to do that?

FWIW, here is a benchmark of adding packages available locally. Takes a 3.4s in my cluttered global environment and 0.13s in a fresh --temp environment.

julia> using BenchmarkTools

julia> @btime Pkg.add("BenchmarkTools")
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.6/Project.toml`
  No Changes to `~/.julia/environments/v1.6/Manifest.toml`
  ...
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.6/Project.toml`
  No Changes to `~/.julia/environments/v1.6/Manifest.toml`
  3.399 s (5650138 allocations: 1008.12 MiB)

(@v1.6) pkg> activate --temp
  Activating new environment at `/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_okeivJ/Project.toml`

julia> @btime Pkg.add("BenchmarkTools")
   Resolving package versions...
    Updating `/private/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_okeivJ/Project.toml`
  [6e4b80f9] + BenchmarkTools v1.2.0
    Updating `/private/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_okeivJ/Manifest.toml`
  [6e4b80f9] + BenchmarkTools v1.2.0
  [682c06a0] + JSON v0.21.2
  [69de0a69] + Parsers v2.1.1
  [ade2ca70] + Dates
  [8f399da3] + Libdl
  [37e2e46d] + LinearAlgebra
  [56ddb016] + Logging
  [a63ad114] + Mmap
  [de0858da] + Printf
  [9abbd945] + Profile
  [9a3f8284] + Random
  [ea8e919c] + SHA
  [9e88b42a] + Serialization
  [2f01184e] + SparseArrays
  [10745b16] + Statistics
  [cf7118a7] + UUIDs
  [4ec0a83e] + Unicode
   Resolving package versions...
  No Changes to `/private/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_okeivJ/Project.toml`
  No Changes to `/private/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_okeivJ/Manifest.toml`
...
   Resolving package versions...
  No Changes to `/private/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_okeivJ/Project.toml`
  No Changes to `/private/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_okeivJ/Manifest.toml`
  129.347 ms (456395 allocations: 33.07 MiB)

As far as I’m aware both seem to be doing nothing and should be instant. I don’t think internet connection is an issue because I got similar results with wifi on and off.

@giordano, please let me know if I am missing something. Really, I find that either I am doing something wrong, or there is room for improvement here.

I am running this from our computer cluster, thus I cannot turn of the internet connection there. I have package called PDBTools.jl, and its latest version is 0.12.11.

Its latest version is installed in the “main” environment, as I will show bellow. Next, I create a temporary environment and try to add that same package. I have measured the time there.

What you will see there is not my everyday experience, because for some reason the internet connection I have at home (and I have been homeworking for a while…) deals better with the access with the package registries - I am not completely sure what that means, I can only feel the side effects.

If what I experience in the cluster was my everyday experience, I would never think of using Julia for anything.

That said, given that improving anything concerning the internet connection on that computer cluster is out of scope, I could deal with that if I was able to fine control how the package manager installs things taking into consideration what is already installed locally. It is conceivable to leave the installations running, get a everything I might need locally, and then use all that in every possible environment I might want to use locally in the cluster. Without that fine control and connection-independent use, using Julia there turns out to be very unpleasant.

This is what I get:

julia> status PDBTools
      Status `~/.julia/environments/v1.6/Project.toml`
  [e29189f1] PDBTools v0.12.11

(@v1.6) pkg> activate --temp
  Activating new environment at `/tmp/jl_SK8KPf/Project.toml`

julia> import Pkg

julia> @time Pkg.add("PDBTools")
    Updating registry at `~/.julia/registries/General`
    Updating git-repo `https://github.com/JuliaRegistries/General.git`
   Resolving package versions...
   Installed Parameters ─ v0.12.3
    Updating `/tmp/jl_SK8KPf/Project.toml`
  [e29189f1] + PDBTools v0.12.11
    Updating `/tmp/jl_SK8KPf/Manifest.toml`
  [59287772] + Formatting v0.4.2
  [bac558e1] + OrderedCollections v1.4.1
  [e29189f1] + PDBTools v0.12.11
  [d96e819e] + Parameters v0.12.3
  [3a884ed6] + UnPack v1.0.2
  [2a0f44e3] + Base64
  [b77e0a4c] + InteractiveUtils
  [56ddb016] + Logging
  [d6f4376e] + Markdown
  [de0858da] + Printf
  [9a3f8284] + Random
  [9e88b42a] + Serialization
  [8dfed614] + Test
  [4ec0a83e] + Unicode
Precompiling project...
  2 dependencies successfully precompiled in 5 seconds (3 already precompiled)
623.901732 seconds (9.11 M allocations: 372.894 MiB, 0.06% gc time, 0.09% compilation time)

julia>

as you can see, we are talking about 10 minutes to do something that is, essentially, nothing.

2 Likes
julia> using Pkg

help?> Pkg.offline
  Pkg.offline(b::Bool=true)

  Enable (b=true) or disable (b=false) offline mode.

  In offline mode Pkg tries to do as much as possible without connecting to internet. For example, when adding a package Pkg only considers versions that are already downloaded in version resolution.

  To work in offline mode across Julia sessions you can set the environment variable JULIA_PKG_OFFLINE to "true".

  │ Julia 1.5
  │
  │  Pkg's offline mode requires Julia 1.5 or later.

Yes, the difference is between a cluttered environment (where the resolver must keep happy more packages together) and a non-cluttered one. But it isn’t like the global environment is any special. But this looks to me one more reason to not have cluttered environments :slightly_smiling_face:

How’d the time be different if you had run the same command in the global environment?

2 Likes

The essential difference is that you only have to add a package once in the global environment, whereas you need to add it in each new local or temporary environment you create that needs it.

EDIT: Thank you @giordano for the correction. This used to say “…for each new temporary environment…”

2 Likes

Can you get more information about how the time is split between the different steps? Without more information I’m tempted to guess that your cluster has a file system which is slow at handling a large number of small files and that nearly all time is spent on updating the registry. If this is the case it would be interesting to see how Julia 1.7 fares when it’s not unpacking the registry.

2 Likes

What if you do this at the beginning of your Julia session?

julia> import Pkg

julia> Pkg.UPDATED_REGISTRY_THIS_SESSION[] = true