Speculations about the default environment (or a new draft environment)

The way the default environment works is somewhat confusing and, I think, unsatisfactory. Most users, except package developers well versed in development workflows, will always use packages by installing them in the default environment. That causes progressive bloat of the environment and dependency issues, as we know.

Thus one idea that came into mind these days is the possibility of a “draft”, or “scratch”, environment. This environment would be temporary, but with the following properties:

  1. using Package would mean a slightly different thing: if the package is available locally for being installed at any other environment, this would trigger its installation in the “draft” environment with the most recent version available (without fetching any info from the web!). Ideally, the return from using would provide that information:
julia> using StaticArrays
StaticArrays installed from local install at version v1.3.5

(is there any reason for this to be much slower than using the package in an environment where it is already installed?)

  1. If the package is not available, just do what is done now (ask if one wants to install it).

  2. At any moment one could “save” the draft environment, giving it a name, and from then on it would behave as any other specific environment we currently create.

  3. If the environment is not saved, that does not imply anything for other environments, except that if some new package was installed it may be available through the quick install route in a new different draft environment.

The reason for this would be:

By starting Julia in a draft environment, a user can use and add packages without bloating the “default global environment”. The experience would be similar to just adding stuff to the current default environment, but without some of the downsides.

Someone could worry about the fact that packages can get outdated, but it is more common and natural for people to update things when required or when things seem to go wrong (with the expected download, compilation, and install cost).

Something like that could isolate the default global environment, in particular, and that is what I was thinking, Julia started by default in a new draft environment. With that, the natural workflow of just opening julia, using or installing the packages one wants to use and doing the job, would be non-intrusive for the general experience in the long term.

All the rest, meaning, all other environments, could continue to be exactly as they are, including the default environment, with the sole exception that one would need to explicitly switch to it if one wanted to add a package to it for some reason. Using environments would be then a nice developer feature instead of an imposition that comes to the table sooner or later for every user.

As usual, I guess that are many good technical reasons things like this not be simple, and good reasons for not be desirable at all. But who knows, maybe it is an idea worth exploring.

9 Likes

There are two versions of the draft workflow, which are actually backed by environments.

version 1: temporary environment. The contents will be deleted after Julia exits or after a system reboot.

(@v1.7) pkg> activate --temp
  Activating new project at `/tmp/jl_aX0eNK`

(jl_aX0eNK) pkg>

version 2: shared environment. Quite similar to the @v1.x environment, and the contents (Project.toml and Manifest.toml) are saved to $JULIA_DEPOT_PATH/environments/alpha.

(jl_aX0eNK) pkg> activate --shared alpha
  Activating new project at `~/.julia/environments/alpha`

(@alpha) pkg>

Most users, except package developers well versed in development workflows, will always use packages by installing them in the default environment.

I believe users need to learn about environments; it’s not complicated, just two files Project.toml and Manifest.toml. My own preference is to install only widely-shared packages (e.g., Revise, OhMyREPL, BenchmarkTools) in the root environment.

9 Likes

This is the recommended way to work with it, but it is a symptom of a problem. Julia should not start by default in a state (an environment) where it is not recommended to work.

That may suggest that Julia should start in a temporary environment. The problem there (or any other environment) is that adding packages that were already installed before is too slow, or can even be broken if one has not internet access at the moment. This is an annoyance with Julia that I don’t know with any other production environment (and I understand to a point the reasons for that, given the particularities of Julia). In most production environments one installs a package and that makes it locally available to be used.

Let us not mix the TTFP issue here, it is that I’m referring to. I am talking about the fact that add a package has inherently a lag because that tries to fetch the registry from the server and, if it finds a newer version the package the whole process of updating etc is triggered. There is a workaround for that. But that does not solve the problem that at end one is obligated to repeatedly add packages to temporary environments all the time to experiment things.

Given the fact that, if one sets Pkg.UPDATED_REGISTRY_THIS_SESSION[] = true, adding a package is (generally) fast, it would be a reasonable (IMO) for Julia to start, by default, in a temporary environment with that option turned on and, additionally, in which using just did add + using with a more concise output, such that the experience would be similar to just using in an environment where the package is already installed (or if it was installed in the default environment).

That would allow the general user to just start Julia, add packages, eventually using packages, without worries about bloating the default environment, which could still be accessible but only for specific and unusual operations, like adding the few packages we want to install there.

If that temporary environment could then be saved, that would be a nice way to introduce the use of environments to new users. (this is a feature which, if not available already, would be nice to have for temporary environments in general).

Thus, to summarize, I think it would be nice to have a kind of temporary environment that in which package adding was incorporated to using such that the experience would be similar to a temporary environment but with the packages installed in the default one (in terms of responsiveness). That would imply assuming an updated registry by default and just adding the local copy of the package when it is available.

I also think that the regular user experience would be improved if that was the default way Julia was launched.

In the current state of things, it seems to me that almost every Julia script should start with

import Pkg
Pkg.activate(temp=true)
Pkg.UPDATED_REGISTRY_THIS_SESSION[] = true
Pkg.add("Package1")
Pkg.add("Package2")

# Here is where it should really start
using Package1
using Package2

Thus, my idea would be that a draft environment would be something that incorporated the first 3 lines of that header, and merged add and using whenever there is a local copy of the package available.

13 Likes

I’m a bit lost here: I thought most of your argument related to interactive usage, but here you mention scripts…

As far as scripts go, I tend to think that every script should be accompanied by a Project.toml (& possibly the relevant Manifest), and begin with something like this:

import Pkg
Pkg.activate(@__DIR__)
Pkg.instantiate() # replaces all Pkg.add calls. This is fast since IIRC at least version 1.5

This means that users of such scripts never have to explicitly activate any environment, nor do they have to ever explicitly add any dependency (and if they do, this will automatically be recorded in the project, which I guess is what you’d want in any case?). This is at the cost of some boilerplate, to be put there at the top of the script by a developer who initially distributed the script and may thus be more adept at environment management.

I do agree however that there are packages which are meant to be used more interactively (and possibly in combination with other packages that the initial author did not foresee). For these, users currently have to know at least a few things about environments, and I agree that this may raise the bar a bit too high for a number of people. I have to say that my initial reaction to your proposal involving temporary environments was very skeptical… but the more I think about it, the more I see the merits it could have for certain workflows.

So I guess my TLDR here is that I think your proposal is a fresh perspective on environments. A perspective of which the benefits may not be immediately obvious to people (including me) who are used to other ways, but a perspective which might very well change the experience of a whole class of Julia users. So please keep thinking along these lines.

6 Likes

I would not differentiate scripts and the REPL in the sense I’m thinking here. I do think that what I’m saying here does not apply for more involved package development (which requires then a deeper understanding of the language anyway).

In this other thread (should better be here, actually), @feanor12 suggested an implementation of almost everything I was thinking about.

I agree @lmiq. I think Julia’s default behaviour to activate shared environment @v1.7 (for example) is causing widespread misuse of Julia.

  • Naturally, undisciplined/new users are going to add to this shared environment… because it requires the least amount of work/thought/learning.
  • It therefore follows that users will eventually encounter issues with version incompatibilities if they stick to Julia long enough.
  • We likely lose potential adopters to bad “first” experience if they don’t feel like searching/posting on discourse.

What I would have expected

Personally, I think launching julia should actually execute what is now (see command-line-options):

julia --project=@.

Quoting the docs:

The default @. option will search through parent directories until a Project.toml or JuliaProject.toml file is found.

This makes much more sense to me as a default behaviour, though I’d change one small thing:

  • If a Project.toml file is not found in any of the parent directories, the current working directory becomes the “active project/environment”.

Some “@.” advantages

  • Follows the basic directory structure most users (even non-Julians) understand.
  • Each new parallel directory (not nested within another environment directory) becomes a clean, new project.
  • Users would need to explicitly activate shared @1.7 (etc) environments in order to setup their common dev tools.
  • Thus: “advanced” usage of the environment stack becomes intentional instead of accidental (which is what we consider to be “good practice” anyhow, right?).
11 Likes

@imiq I’m a bit lost so let me try to summarize your argument. I guess you’re suggesting a rollback/snapshot mechanisim for Pkg operations.

pkg> add Images # common Pkg operations
...

pkg> revert ~1 # revert one Pkg operation; i.e., the last state before `pkg> add Images`
...

pkg> save # save the current status to, e.g.,  `~/.julia/environment/v1.7/`
...

julia> <Ctrl-C>
You have made changes to the `@v1.7` environment. Do you want to save the changes before exiting? [Y/n]

:100: this. The default environment (the one you get when starting Julia without any environment flags) should not be the “shared” environment (i.e the one that is automatically stacked with with a local environment). Currently this is conflated in @1.7.

I remember having tried to convince some of the Pkg authors (@StefanKarpinski) of this a few years ago without much success. IIRC, I was proposing a separate global environment and some convenience syntax like ] add --globalenv MyPackage.

(@davidanthoff I think to remember you were also arguing in this direction back when we had this discussion in slack. Please correct me if I’m wrong.)

7 Likes
12 Likes

Not really. My broader idea is that the default environment should not have persistent and widespread implications when used. Thus, it should not be the @v1.7 one, and most likely be a temporary environment.

By being a temporary environment, some usability issues arise, particularly the fact that packages previously installed are not available for reuse. That is why in this environment using should behave differently.

These ideas are initially implemented here: GitHub - feanor12/Draft.jl: A small package to automatically create temporary environments in offline mode.

The package needs polishing, but actually it is already useful, as one can do:

julia> using Draft
  Activating new project at `/tmp/jl_bqfJlG`

julia> @reuse StaticArrays Plots # responsive, offline

julia> plot(Tuple.(rand(SVector{2,Float64}) for _ in 1:100)) # works

julia> save_environment("./my_draft")
"./my_draft/Project.toml"

6 Likes

It’s unfortunate there are no hooks in git for “on pull” etc. Because then you could force yourself to sign off on any changed or added Project.toml or .exrc files.

I think variations on these are best option. I think it is also basically consistent with how people would think about opening a folder in vscode and Jupyter notebooks could easily adjust as well.

But I also feel like the current behavior is pretty close and a few minor tweaks might be all that is required! Basically, instead of forcing people to manually creating a temp environment, make it activate the current folder if it can’t find a parent.

Reflecting back on the tradeoffs which led to the current behavior: It used to be that having things in (v1.X) was fragile but relatively convenient for interactive use and with less “latency” with a using in a new project. Now it adds nothing and just makes things fragile. There were two changes in julia v1.7 that alleviated the previous problems:

  1. using SomePackage asks the user if they want to install the package. That is a game changer since it means it is easy to install pacjages in a local environment
  2. Instantiating and adding packages to a project/manifest is very fast now. So the benefits of having a shared environment for performance reasons no longer apply.

Since the underlying tradeoffs have changed with the 1.7 release, I think it is reasonable to reconsider behavior that is basically non-breaking. My dream with would be the following logic on calling julia --project in a folder/opening in vscode:

  1. Core Old behavior still applies: If a Project.toml exists in this folder or its parents then it activates it.
  2. Change from default to global environment to default to local folder: As I think @MA_Laforge says, if it doesnt find a project file then it creates a new project file in the current directory.
    • Basically the current equivalent to use --project and then ] activate . but doing that automatically
    • While a change, I don’t see this as breaking behavior. In fact, I think it makes more sense given the --project CLI than the current version. Many find the current behavior unintuitive because they specifically asked julia to use a --project, so why does it end up in the “global” environment?
  3. If a user wants to activate the global environment? They already can. No changes need to happen.
    • They can already do this by: (1) calling julia without the --project argument; or ] activate at any point
    • Personally, I feel like in a Julia 2.0 situation having the --project arg as the default behavior when calling julia itself, to emphasize reproducible project files… but that is debatable

While this change seems small, the impact in getting new users to use reproducible package/manifest files by default could be enormous. Julia has the world’s best package manager for reproducibility, so lets help people use it correctly!

I think this one change would help a great deal. The last two changes would take things to fix nearly every problem I have seen here are for package development.

  1. ]dev . parent files by default, or with additonal arguments.
    • Lets say I have a Project.toml and then a subfolder with a benchmark/Project.toml or experiments/Project.toml which uses the parent package.
    • In every case I know, you really want to have that subfolder use the parent package in its manifest.
    • So I would propose that ] activate sub_folder does exactly that. Backed by a change Pkg.activate(...; dev_parent = true) The behavior is the current one of ]activate sub_folder then recursively walking up to see if it finds a Project.toml in a parent folder and go ] dev . etc.
    • This would be far more intuitive than the current behavior where everyone is puzzled why they create a project file and subfolder in the pacjage, but it dpesn’t automatically use its own version of a package?
    • Keep in mind that if someone wanted to only work on the subfolder and not have this automatic dev thing happen, then can always just open up that folder directly with julia --project sub_folder and it wouldn’t automatiucally dev anything.
    • If you think doing it by default is too much then add in an argument: ] activate subfolder --dev_parent where we could just train peopleto add that argument.
    • While you can argue “sure, that is nice, but they can already manually ]dev . themselves?” my retort is that nobody can figure it out on their own. If you hear people’s elaborate workflows (even after using julia for years) hopefully that gives you a sense on discoverability of this feature.
  2. Consistent activation of test environments: The related problem is that if you have a test/Project.toml then the ]test works great but the ] activate test does not do what a person would think it would do.
    • To enable interactive editng of unit tests in vscode/etc. TestEnv was written to hack this sort of support
    • It works great in many ways, but after doing the using TestEnv; TestEnv.activate() the project file is a temporary one and any package operations do not go back to the test/Project.toml. So this really is a hack and core julia features become unintuitive afterwards.
    • So my feeling is that this needs to be fixed one way or another in the package instantiation itself.
    • Is there any reason it can’t be made consistent with ] activate test --dev_parent behavior? What else does TestEnv need to do? If it can’t then maybe a ] activate test --active_as_test or ] test --activate etc. is necessary to emulate the behavior.

I think that if these changes and the previous one about changing the julia --project behavior happened then the whole thing would be very intuitive to new and intermediate users alike.

1 Like

I don’t think this solves the issues I’m thinking of. On Windows, in particular, people double-click on the Julia icon to launch Julia. “The current folder” is something that people not even know what it is.

This is not true if the default behavior associated to add is to search for the latest version of the package online. Some packages take quite a while to download and install (of course that depends on your resources), and thus the tradeoff relative to just using a shared package can be large.

Let me illustrate this right now:

julia> using Draft
  Activating new project at `/tmp/jl_4hToHf`

julia> @time @reuse CUDA
  6.742174 seconds (17.57 M allocations: 1.079 GiB, 3.63% gc time, 60.98% compilation time)

vs.

(@v1.7) pkg> activate --temp
  Activating new project at `/tmp/jl_JBZRst`

julia> import Pkg

julia> @time Pkg.add("CUDA")
   Resolving package versions...
    Updating `/tmp/jl_JBZRst/Project.toml`
  [052768ef] + CUDA v3.8.5
    Updating `/tmp/jl_JBZRst/Manifest.toml`
  [621f4979] + AbstractFFTs v1.1.0
...
  [3f19e933] + p7zip_jll
Precompiling project...
  7 dependencies successfully precompiled in 37 seconds (26 already precompiled)
 41.162015 seconds (4.18 M allocations: 263.236 MiB, 0.28% gc time, 6.12% compilation time)

And these 41s were actually not bad, depending on what gets updated and the quality of my internet connection at the moment, it can take much more time. Thus, my workflow for experimentation always start with avoiding online fetching of new package versions.

The fact that the projects are associated to directories is an additional complication. I myself have many scripts in the same directory with exploratory code (not mention the MWEs tested from this forum). I just want a quick and safe way to run these scripts.

This is wonderful and true. Yet, I think most Julia users don’t care about that that much. Having that feature is spectacular for package development, scientific reproducibility, sharing, etc, but in the two language problem world the first language is for prototyping, scripting, plotting, nothing that requires strict reproducibility, and most people will interact with the language more often from that perspective.
(thus my vote for temporary default environments).

The other comments are more related to more in depth and involved development workflows, which were not my focus here.

3 Likes

I’m wondering about this. They are only able to work interactively or would need to include files by relative or absolute path then.

Yes, I have seen that, but only with users using an IDE…

2 Likes

I use windows and have never used the doubleclick. I either open the folder in vscode (which opens with --project, use jupyter notbooks which is the same, or do julia --project if I open in a folder. I don’t think making running julia from the default location is especially important, but maybe others disagree.

I think the ship has sailed on that one. projects and directories are intimately linked. Exploratory scrits can be written in a folder with a preexisting Project file… I don’t see why it makes sense for that to be in the global.

Surely I was not referring to any package developer, I’m talking about occasional users of packages as applications. One of the great advantages when I started to work with Julia instead of Fortran is that I had a package in Fortran that I just distribute as a binary for Windows, and requires a command line parameter. Many Windows users have no idea what to do with that, and ideally I would have to provide a graphical user interface with two input fields, just to help them. With Julia they click on the Julia icon and are able to follow the tutorial copying and pasting stuff to the REPL.

Part of the confusion to users is that the “Project.toml” is sometimes seen as something for package developers rather than just as a reproducible environment which you probably always want to have. That --project terminology helps there but I think it also comes down to teaching them about reproducibility.

I think that if we teach end users that they should always have a project file whenever they write code, it is easy enough for them to follow (even with the current setup as they just need to go ]activate . after starting things up). With Jupyter Notebooks and using the REPL from vscode, it does it automatically and already has a folder associated with it.

New users are probably far better off only running a REPL within vscode until they know what they are doing. The features are better and it makes things a lot more intuitive.

I spend lots of time on Python which doesn’t have such as seamless ability to have reproducible environments, and it is a mess. In julia you can entirely avoid reproducibility problems if you only keep development tools in your (v1.7) and have everything else in a Project.toml.

2 Likes

I think this is the source of the disagreement. Maybe you don’t need fancy package features for reproducibility, but what if versions/etc. between these things clash between scripts? Happens all the time and is a source of utter confusion for everyone. What if you want to go back and run a script you ran 6 months ago? I dread using Conda and python specifically because of this. No fancy package management, just being able to run scripts from different projects on the same computer.

All people need to do is open a folder in vscode to start the REPL, use jupyter notebooks, or start it with julia --project and things work great without any clashes between scripts. Right now a manual ]activate . is all it takes if a file doesn’t exist.

3 Likes

I don’t deny these virtues of the environment system of julia, at all. They are great.

But, truly, if I take a plot from some months ago and the script is broken because plots was updated, I just fix the script, and most of the times this is easier than maintaining a reproducible environment for everything that I dare to code.

ps: But here I think we became circular, and I point out one observation that cannot be neglected: no package whatsoever in the Julia ecosystem guides new users through the lessons of dealing with environments before telling the user just to install and use the package. Either we need a complete rewrite of the doc pages and radical cultural shift, or we need to assume that we don’t want this barrier to be exposed to our first time users.

1 Like

You are luckier than I am. Regardless, ] activate . should be enough to maintain an environment for relevant directories. Everyone has their own workflows but the consensus (and feature set) has been to move towards having reproducible environments as central to julia’s packages, so I think a better approach is to remove the (minor, in my opinion) warts that make it unintuitive to do for new users rather than making a conda-style global environment especially convenient.

But, as I said, everyone has their own workflows.

1 Like