It takes several minutes to add a single package

Environments stack, so if BenchmarkTools is in your global environment and then you activate a temporary one on top of it, you have still access to BenchmarkTools (i.e., using BenchmarkTools will work). That’s what I do all the time

2 Likes

Exactly. So anything I expect to use often, I install in my global environment. For me this is things like CSV, DataFrames, DataFramesMeta, StatsPlots, Distributions, Turing, MCMCChains, GLM, Optim, and a few others. In fact I even compile a sysimage with those so it’s super fast to using them.

Thanks for letting me know!
I’d still have the problem that using tailrec and @tailrec would not be available elsewhere until I add tailrec again, though. Ideally I’d like to always have everything accessible without having to deal with unexpected ]updates.

At first having to install your packages in every environment where you need it may sound as an unnecessary burden but it is not that much of a burden and it is beneficial. Particularly if you use your code for work and at some point you need to share your code with someone, having a reproducible, unnecessarily cluttered environment is a total win.

I do think that the ] update package command could have better default behavior.

1 Like

Note: This is still not a great idea, but to be fair an alternative isn’t easy until 1.7 is out.

In the future, you can have named projects, described here

So if you wanted a bunch of data packages easily accessible, didn’t want them in your global environment, and didn’t want to make a system image, you can do

julia --project=@data

where data is in your environments folder somewhere.

This doesn’t do what I want either though, because what I want is to have a project for my particular data analysis task, but within that project it has all the “infrastructure” one often needs directly available. Now if I could say something like --project=. --project=@data --project=@optimization --project=@database and have the current project be the . directory but the various other projects are “stacked” on that, it might make sense. (BTW: I really like having a built sysimage with all those things, so for now that’s my main solution)

Merging projects would indeed be cool.

Mods feel free to split off this digression.

1 Like

Actually it is pretty much of a burden particularly in the way that temporary environments are becoming central to testing and experimental workflow. These temporary environments are useful to avoid cluttering the main one, but adding packages to them is not instantaneous for packages available locally. Start playing in a temporary environment in which I added a bunch of packages to then have do restart everything for some reason is quite frustrating*. I cannot find an ideal solution (which I think should pass by the possibility of "free add"ING packages that are already available and precompiled locally).

Also while we keep suggesting people to not clutter the main environment, the truth is that it is special in the way that it interacts with all other environments. It would be better if we could just have fine control of what happens with it, and allow it to be cluttered.

The whole environment stuff is a powerful and useful feature. But being obligated to use it is not.

*Btw: is it possible to name and then save a temporary environment?

1 Like

Good points.

I’ve experienced the ]test thing especially when someone offers a MWE for a problem that looks kind of easy. Then I try adding the package into a temp environment and I get turned off after the first few minutes of package installation and precompilation.

2 Likes

While I agree, I think maintaining minimum necessary environments for a project should be opt-in rather than mandatory. For example, most of the time I am developing on latest versions so they can ]add anything they don’t have and it will all work. And if I want to ensure the best possible experience for them, I should be able to lazily explicitly create a new completely empty environment and manually populate it according to missing package errors only when I need to.

I agree. I think this is a central point here.

Here’s a potential workflow:

Remove “@v#.#” from Base.LOAD_PATH in startup.jl

Typically always work in the global environment and add packages with ]add --preserve=all package. Update with ]up when I run into a package bug, want a new feature, or have a few minutes where it is okay to let Julia be unresponsive and am okay with a risk of having to deal with broken updates.

When I want to share a minimum working environment, including whenever I do package development, make a new temporary or package environment, run my code, and ]add anything that I need (possibly automate this process with the Pkg API).

Advantages:

  1. I never have to worry about environments unless it is necessary.
  2. Julia shouldn’t spend several minutes precompiling when I don’t expect it to
  3. I typically only have to ]add anything when I start using a package for the first time ever (or change Julia versions?)
  4. When I make a temporary environment it is totally clean so I can share the Project.toml without worrying about accidentally depending on something in my global environment

Disadvantages:

  1. Large global environment (I don’t see why this is a problem)
  2. There is an additional step to create a reproducible environment when I want to share a MWE (note that in the classic workflow I would already have manually created this environment and manually ]added everything to it as I went along)
  3. less explicitly aware of which packages I’m using where for my personal code

What do y’all think? What advantages and disadvantages am I missing? Why isn’t this the default behavior of LOAD_PATH and ]add?

EDIT to clarify recommended default behavior of ]add Xxx in non-trivial cases per @lmiq @dlakelan & @kristoffer.carlsson’s comments:

When possible, ]add Xxx should add the latest version of Xxx without updating anything else
Where this is not possible, but it is possible to install the latest version of Xxx with updates to other packages it should prompt:
"Installing package Xxx requires updating N packages. Expected runtime 15s. Type D for details. Continue? (Y/N)”
and D should yield a menu giving the user options to install…

  1. Xxx v0.21.6 which is 12 days out of date and the latest version compatible with currently installed dependency versions.
  2. latest version of Xxx v0.22.0 and update N packages. Expected runtime 15s
  3. latest version of Xxx v0.22.0 and update all M out-of-date packages. Expected runtime 373s

Where it is impossible to install the latest version of Xxx due to dependency conflict, the user should immediately be given detailed options:

The latest version (v0.22.0) of Xxx is incompatible with Yyy (last used 7 years ago) and Zzz (last used yesterday). You may install…

  1. Xxx v0.21.6 which is 12 days out of date and the latest version compatible with currently installed dependency versions.
  2. Uninstall Xxx & Yyy and install the latest version of Xxx v0.22.0, updating N packages. Expected runtime 15s.
  3. Uninstall Xxx & Yyy and install the latest version of Xxx v0.22.0, updating all M out-of-date packages. Expected runtime 373s.
  4. Xxx v0.21.8 which is 2 days out of date and the latest version compatible with Yyy and Zzz. This requires updating N packages. Expected runtime 15s.
  5. Xxx v0.21.8 and update all M out-of-date packages. Expected runtime 373s.
  6. Install Xxx in a new environment…
  7. Copy the current environment to a new location and replace Yyy and Zzz with Xxx here…

These menus, especially the second, might be best refactored into multiple independent choices (e.g. factor our weather to update everything, or remove that option altogether), though I think presenting every possible choice is nice to give a user a big-picture and results-oriented understanding of their choices, with some indication of why their preferred option (have the latest Xxx, Yyy, and Zzz installed now without a long precompilation time) is not available.

1 Like

One point is that a new package, or a newer version of one, may require the update of a dependency, and that may trigger the update and compilation of a whole list of packages. I think that is aceptable, but should be a opt in. “Installing package Xxx requires updating N packages. Type D for detail’s. Continue? (Y/N)”.

Better yet if it was possible to suggest the latest version that could be installed without changing the current package state.

1 Like

My impression is that the biggest problem I’ve encountered is that some package you used once a while ago to see if it would work but you actually didn’t find helpful can hold back basically everything. For example I had that happen with Turing, where I was on a full year old version because… I don’t know some package I totally didn’t care about was incompatible with newer versions of a package Turing needed.

4 Likes

In 1.8 (you can try the nightly) there is an --outdated flag that can be passed to status which give information about packages not being on the latest version and what is holding them back.

11 Likes

This takes the possibilities to another level. Having a tool to inspect the installed packages, report which ones are not used (list them by last use or precompilation, probably easier), could allow easier control of what is going on. In short, an interactive package manager with the features of something like Synaptic would be great.

1 Like

This is, like, false. If you’d get the same version of the package there is no difference in which environment the package is installed: the default environment is simply the default, it isn’t more special than the others so you have faster installation processes.

1 Like

I didn’t say that that is different from the main environment. The point I wanted to stress in that phrase was twofold: 1) even if the package is available locally, the package manager will check the register (you may find this irrelevant, but I will record what that means next time I go to work at my parents’ house); 2) it may trigger the installation of a new version and possibly the update of many things not related to the package.

The difference from the main environment there is that if the package is installed in the main environment I can use it from other environments avoiding all that. And if the package is available in any other environment I can’t (AFAIK).

I really don’t see what’s the difference between the global environment and a local environment here, but Pkg offline mode is a thing if you don’t want to spend time on updating the registry on a slow connection and you know the package is available in your depot (and I’m also on slow connection occasionally, I know what that means).

Environments still stack

How is that? Actually that can be very useful if I can only activate the “online” mode when I need to.

I don’t really know what that means. How this relates to the fact that I can’t “use” a package from an environment which is not the main? Is there a way to do that?

FWIW, here is a benchmark of adding packages available locally. Takes a 3.4s in my cluttered global environment and 0.13s in a fresh --temp environment.

julia> using BenchmarkTools

julia> @btime Pkg.add("BenchmarkTools")
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.6/Project.toml`
  No Changes to `~/.julia/environments/v1.6/Manifest.toml`
  ...
   Resolving package versions...
  No Changes to `~/.julia/environments/v1.6/Project.toml`
  No Changes to `~/.julia/environments/v1.6/Manifest.toml`
  3.399 s (5650138 allocations: 1008.12 MiB)

(@v1.6) pkg> activate --temp
  Activating new environment at `/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_okeivJ/Project.toml`

julia> @btime Pkg.add("BenchmarkTools")
   Resolving package versions...
    Updating `/private/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_okeivJ/Project.toml`
  [6e4b80f9] + BenchmarkTools v1.2.0
    Updating `/private/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_okeivJ/Manifest.toml`
  [6e4b80f9] + BenchmarkTools v1.2.0
  [682c06a0] + JSON v0.21.2
  [69de0a69] + Parsers v2.1.1
  [ade2ca70] + Dates
  [8f399da3] + Libdl
  [37e2e46d] + LinearAlgebra
  [56ddb016] + Logging
  [a63ad114] + Mmap
  [de0858da] + Printf
  [9abbd945] + Profile
  [9a3f8284] + Random
  [ea8e919c] + SHA
  [9e88b42a] + Serialization
  [2f01184e] + SparseArrays
  [10745b16] + Statistics
  [cf7118a7] + UUIDs
  [4ec0a83e] + Unicode
   Resolving package versions...
  No Changes to `/private/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_okeivJ/Project.toml`
  No Changes to `/private/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_okeivJ/Manifest.toml`
...
   Resolving package versions...
  No Changes to `/private/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_okeivJ/Project.toml`
  No Changes to `/private/var/folders/hc/fn82kz1j5vl8w7lwd4l079y80000gn/T/jl_okeivJ/Manifest.toml`
  129.347 ms (456395 allocations: 33.07 MiB)

As far as I’m aware both seem to be doing nothing and should be instant. I don’t think internet connection is an issue because I got similar results with wifi on and off.