Not easy to add utility packages to env

Let’s say I’m trying to edit someone’s package. While debugging my code using the environment provided by the package’s Project.toml, I realize it would be useful to plot something. So I’d like to add Plots.jl. However, doing so means that I will have to modify the package’s Project.toml, which I of course don’t want to do. Alternatively, I could make sure Plots is always in my global env, but I may not always need plots and my global env should be as lightweight as possible.

Ideally, I could simply do something like pkg] stack @plotting and have plotting packages loaded into my env. If this isn’t possible, what are my options for making the above workflow easy? Do I need to modify the LOAD_PATH manually?

1 Like

I’m searching for the same answer, and feeling puzzled that this might not be an established process.

I’m coming from python/poetry where you have dev.dependencies added to the pyproject.toml where you keep things like pytest, ptpython and useful interactive development tools. These are available in the virtual environment but are not part of the package build.

In Julia here is what I found so far:
Test dependencies can be added in the package REPL with pkg> activate ./test and then add Test as a test dependency. I guess the root directory dependencies are still part of the new ‘test’ environment?

pkg> ?develop gives instructions about managing a local development environment with develop --local ExamplePackage. The documentation for Modifying A Dependency seems to cover this use case, but I won’t know until I try it and see what happens in various scenarios. The idea seems to be copying the entire package into a subdirectory where you can make changes without altering the registered package code (isn’t this what git branches are for?).

Don’t forget you also need to use Revise.jl and probably some other dependencies for the iterative process of development.

The documentation seems helpful for someone who knows how this all works already, and its a little disappointing because the package manager is one of the great things about julia. It took decades for something like poetry to make python packaging excellent. After searching a bit today I haven’t yet found an explainer of how this basic software development process is achieved in Julia.

Just install Plots.jl (or whatever plotting package) in your global environment. Then with the other persons package environment active, if you do using Plots it will search that environment, not find it, then try the global environment, and find it there. In general, the load path is kind of “tiered” so it will try whatever environments you have in Base.LOAD_PATH in order. The default is

julia> Base.LOAD_PATH
3-element Vector{String}:
 "@"
 "@v#.#"
 "@stdlib"

I generally just keep a few general-purpose packages in my global environment (Plots, BenchmarkTools, etc…), which will be accessible no matter what other main environment you have active.

2 Likes

@marius311 that works, but I’d like to avoid polluting my global env. There are often dependencies which I find myself wanting semi-frequently for development work, without installing them globally. Hence my suggestion about stacking a named environment @plotting into my regular environment.

Perhaps there’s a GitHub issue lying around for this? (Edit: see(Feature request: activate environment on top of current stack · Issue #1245 · JuliaLang/Pkg.jl · GitHub and maybe Proposal for "sub-projects". · Issue #1233 · JuliaLang/Pkg.jl · GitHub)

I think it’s clear there’s a QOL improvement to be made here.

@merlin I actually quite like Pkg.develop! It’s orthogonal to the issue I’m raising here: Pkg.develop basically just clones the package for you as a git repo, and then you can do the usual git stuff.

Poetry’s dev dependencies is an interesting idea to raise. I haven’t personally found a need for those yet, thanks to the flexibility provided by test dependencies and the ability for subfolders (e.g. tutorials) to have their own Project.toml’s.

I have been missing the ability to make stacked environments that are purely local, though. My decision to use Plots to debug should not affect upstream at all, so it should not go anywhere in the Project.toml which is tracked by git. But I’d like it to go somewhere locally (e.g. into a named environment), and be easily stackable with the package env without manually modifying local path!

1 Like

I pretty much always create a scripts folder in the same project and activate that to do my development.

julia> mkdir("scripts"); cd("scripts")
pkg] activate .
(scripts) pkg] dev ..
(scripts) pkg] add Revise Plots 

Recently though I’ve made Revise.jl and BenchmarkTools.jl global which is nice. I try not to have Plots.jl globally because it’s quite a heavy package.

4 Likes

Out of curiousity, what do you use your global environment for, if not for this kind of thing?

In any case, you can always create some other environment to house your general-purpose libraries, leaving your global environment untouched, and add that to your LOAD_PATH via environment variable, eg export JULIA_LOAD_PATH="~/.myenv:$JULIA_LOAD_PATH", where your ~/.myenv/Project.toml has Plots and other stuff you tend to need.

I use my global env for packages such as Revise that are just ubiquitous. For something like Plots or ProgressBars, etc., I only want to add them to my environment some of the time.

I think the suggestion made by @jmair is a good one. Make a debug subfolder untracked by git, add the original package and any of its desired internal dependencies there, and manually add in my plotting dependencies etc one-by-one: that’s not too bad.

1 Like

Just to make sure its clear, but just having a package in your global environment, even a “heavy” one, doesn’t do anything unless you load it, and it doesn’t change any other package’s environments. So I don’t totally see the drawback of putting a bunch of utility ones in there. But if the other thing is more convenient then definitely go for it.

I use direnv to manage the load path. For example, if you have some development tools in the devtools subfolder, you can use the following in the .envrc file:

export JULIA_LOAD_PATH="${PWD}:${PWD}/devtools:"

This will expand to the following load path

julia> Base.load_path()
4-element Vector{String}:
 "/home/fredrik/dev/TestPackage/Project.toml"
 "/home/fredrik/dev/TestPackage/devtools/Project.toml"
 "/home/fredrik/.julia/environments/v1.8/Project.toml"
 "/opt/julia/julia-1.8.5/share/julia/stdlib/v1.8"

I wrote a post using direnv together with Julia here.


Of course, sometimes the most convenient thing is to add it to the package environment, just without committing it. I do this all the time too.

6 Likes

Of course, sometimes the most convenient thing is to add it to the package environment, just without committing it. I do this all the time too.

Me too! It’s just annoying to not be able to do git add -u.

Thanks for the direnv suggestion! I’ll give it a try.

Personally, I think @jmair’s suggestion is what I am going to do in the future: thinking about it, I don’t actually need a package’s internal dependencies most of the time, so betteer to go into a different folder such as debug and dev the package there + extra dependencies, rather than activate the package’s internal env. So my process would be:

  1. make a debug subfolder
  2. dev .., i.e. dev the package in the parent dir
  3. Run e.g. add Plots Revise ProgressBars

I was thinking about how to make step 3 easier. One way is to add a named environment like @utils into the LOAD_PATH via direnv. Here’s an alternative idea I have: what if we could replace add Plots Revise ProgressBars by something like add @utils or add --copy @utils? Let me explain: this solves two problems, 1) I don’t have to list the three packages manually, 2) what I want the add @utils to do is make sure we just use the same versions as in the latest @utils rather than the latest ones available, so that the precompiled binaries for Plots etc. are definitely already in existence and the process is nearly instant. And it shouldn’t be a headache to implement because there’s no hidden stacking and it’s almost syntactic sugar, converting to add Plots@1.38, .... (If the @utils environment were to update later on, the debug environment would not auto-update.) [Ultimately I guess what I’m going for here is equivalent functionality to modifying LOAD_PATH, but through Pkg]

1 Like

From people’s answers I’m starting to get the picture. There seems to be a diversity of workflows, but still some pain points.

Here is the poetry anology I’d like to create for myself:

  • DONE: Package dependencies for my application. This is handled very nicely by Pkg and Project.toml
  • Packages available (usually in the REPL) for testing, development, or convenience but will not be part of the package when users install my package.

I think the test/Project.toml environment does this, but these packages are only available when running ‘pkg> test’ (?).

As a concrete example, I have a number of DataFrame convenience functions I add to data projects. Also the OhMyREPL I use everywhere – but the global environments dont overlap into new projects so I have to add this to every project then remove it if I’m going to publish a package.

I dont understand the ‘pkg> develop’ use case. Normally I would checkout a new branch to do what I think is documented for this.

Edit: Now I finally understand all that LOAD_PATH business. I believe my requirements are fulfilled, I just didnt know it:

  • common utils in global env (or env in the LOAD PATH
  • ‘using MuhHelperPackage’ will bring it into my interactive scope but not as part of tests or package deployments.
  • ‘develop’ is for when you need to hack on the code for one of your dependency packages.

awesome!

Wow this helps a lot. Funny its all in the documentation but this common usage was not obvious to me at all.

I mean, game changer for me, thanks!

1 Like

Ohhh, now maybe I understand the ‘develop’ usage.

When you clone a package repo… No, i dont understand why you need to clone the whole package again into a subfolder ‘dev’, rather than ‘git checkout -b dev-new-idea’.

Pkg.develop is for getting the repo in the first place. When you add a package via add, there is no git repo: you’re not expected to be editing the code of the package (the usual case), it’s fixed at a version and buried in some folder.

If you’ve manually cloned a package or are making your own, then you don’t need to do dev PackageName. Instead, you can just do dev {local path to git repo}: this will add your existing git repo to your Julia environment.

(If you have more questions about Pkg.develop I’m sure people will be more than happy to help! But it might be better to create a new thread for it)

Perhaps you know but if you only want to use the extra packages temporarily you can do
] activate --temp instead of creating a folder etc.

2 Likes

For me there are serious downsides that put me off enough that I mostly avoid stacked environments. From the documentation:

  1. The primary environment—i.e. the first environment in a stack—is faithfully embedded in a stacked environment. The full dependency graph of the first environment in a stack is guaranteed to be included intact in the stacked environment including the same versions of all dependencies.
  2. Packages in non-primary environments can end up using incompatible versions of their dependencies even if their own environments are entirely compatible. This can happen when one of their dependencies is shadowed by a version in an earlier environment in the stack (either by graph or path, or both).

Let’s say I put Plots.jl in a global environment. When I work in some project, I’ll probably not add Plots.jl locally (that’s the whole point of having it global). This leads to the following issues:

  • I lose reproducibility in this project: the version of Plots that I’m using is not tracked, it’s not committed to git, and it can change when I decide to update Plots while working on another project (and there’s the same problem with the dependencies of Plots).
  • I lose the Pkg guarantees that the versions of my dependencies are compatible with each other. It’s possible that my plots are wrong in ways that are hard to detect, because some dependency of Plots.jl is shadowed by my project environment.

Losing reproducibility and guarantees of version compatibility is a very steep price to pay for some convenience!

But if plotting is an important part of the project, and not just a one off thing, maybe it should be part of the projects env?

1 Like

Yea this a definite drawback worth highlighting, my answer was more for when you quickly need to make a plot in some random environment which doesnt have a plotting, rather than building a full reproducible analysis pipeline with plotting, etc…, in which case I definitely agree with you.

1 Like

Yes for quick testing it’s nice… I do wish the stacked feature was opt-in though, because it’s easy to forget to add locally when something is already available from global.

That’s also my answer to @albheim: yes it should be in the project dependencies… but in practice if using Plots works, I might not think of checking that it was properly added locally. Maybe I’m not disciplined enough but every two weeks or so I check my global environment and find dependencies there that were added by mistake :slight_smile:

2 Likes

You could maybe put in your startup.jl that the global env is removed from the load path, and only local and stdlibs remain?

1 Like