Handling various types of dependencies

(I apologize in advance for the very long post)

As I use Julia for ever more complex and complete projects, I seem to run into the following problem, where a single “project” contains lots of related code, which will be used in different contexts to perform different tasks. (I’m speaking here of a “project” in the Github sense; from here on let’s imagine I’m speaking of a git repository, and reserve the words “package” and “project” to express the Julia concepts defined in the Pkg.jl documentation.)

To illustrate what I mean, consider for example a git repository defining a Julia package and containing, in addition to the (re-usable features that the package exports) some code automating various development tasks:

  1. a test suite, implemented in test/runtests.jl (which in turn potentially includes other files) and requiring specific dependencies (in addition to those of the package code itself)
  2. possibly some code to post-process the results of test cases (for example submit coverage results to web services such as Codecov or Coveralls). Such scripts typically depend on yet other tools.
  3. a Documenter.jl-based documentation, in the form of markdown files accompanied by some julia code (in doc/make.jl) that actually builds and publishes the documentation. This obviously depends on Documenter.jl
  4. some benchmarking scripts, which have yet other dependencies (like BenchmarkTools.jl or PkgBenchmark.jl)
  5. possibly some code to post-process benchmarking results and generate plots (like a roofline model, or a plot of run time vs problem size). This code has non-trivial dependencies like Plots.jl or GR.jl, which you don’t necessarily want to install on the high-performance system where you ran the benchmarks

The point I am getting at is twofold:

  • all this code really belongs to the same git repository; it would not make much sense to have some of it elsewhere. Yet it can be subdivided into several relatively independent tasks, which (i) will be used by different types of users, possibly on different types of systems, (ii) are run in different ways, and (iii) require different sets of dependencies.
  • we currently have various ways of dealing with these tasks. Usually, a top-level Julia program, whose name follows some convention (e.g. test/runtests.jl or doc/make.jl), is used as entry point. The methods (that I know of) to deal with dependencies are variations around the following two themes:
    • dependencies are listed in the [extra] and [target] sections of the top-level package’s Project.toml. To my knowledge, test is the only legal target at the moment, but we could imagine having the possibility to list additional (possibly user-defined) targets (benchmark, for example). This, in addition to the known path of the entry point, allows tools like Pkg.test() to automate the process of running the task in an appropriate environment.
    • the set of source files related to a task is grouped into a sub-folder (like doc/ for example), which acts as a fully-fledged project, with its own Project.toml file defining dependencies. I’ve seen this pattern used not only for documentation building scripts, but also for test-coverage submission scripts, and I’m starting to use it for benchmarking as well. The only (slightly annoying) issue I’m facing with this solution is that one has to explicitly handle environment-related issues: fix LOAD_PATH to be able to use the top-level package, call Pkg.instantiate() when needed and other such things.

So here are (at last!) my questions:

  • are there plans to extend the [targets] mechanism for it to be usable for other purposes than testing?
  • is there a particular issue with having various sub-project to perform various tasks, each one listing its dependencies in its own Project.toml file?
  • should we keep these two technique (which solve very similar problems, at least IMHO), or should one be advertized more than the other in the documentation, so that we can perhaps more efficiently build some tooling around it?
1 Like

AFAIU the distinction between [deps] and [extras] is only relevant when you are working on a package.

If this is a “project”, I would just put everything in [deps] so that pkg> instantiate gets everything — this is otherwise costless if you are not loading a package in a particular script.

Yes, I’m mostly speaking of packages here. Sorry if that was not clear.

Even in the case of a simple non-package project, the same problem could happen. Think for example of a research paper (like in this thread), where you might have a program to reproduce the results themselves and store them in data files, and another program to produce the plots shown in the paper.

Depending on the system where you want your program to run (for example on a supercomputer), it might not be convenient to install plotting packages there. The post-processing and plotting stage can then be run on an other system, where graphics dependencies are easier to install.

1 Like

My understanding is that this is what stacked environments are for, but I do not know how to use them in practice.

1 Like

I’m not following Pkg.jl development very closely but given that Julia 1.2 started supporting test/Project.toml I guess [extras] is going to fade away.

FYI, I also wanted to ease $subproject/Project.toml handling so I created a “task runner” package Run.jl (documentation). The API is that Run.script("DIRECTORY/SCRIPT.jl") automatically activate/instantiate DIRECTORY/Project.toml then run the script in an isolated environment. There are quick shortcuts Run.test() and Run.docs() but its useful for general “tasks”; e.g., running benchmarks in CI. All these sub-projects automatically Pkg.devs the parent directory so you can instantiate it even if your main project is not in the registry or does not have URL.

1 Like

Good idea!

Indeed, stacking the “superproject” environment with the environment defined by a “subproject” would probably be ideal.

I don’t either. Maybe we lack some tooling and/or documentation in this area.

Here is an attempt fiddling with LOAD_PATH. The following tests are run in a directory with the following files:

.
├── Manifest.toml
├── Project.toml       -> deps: LinearAlgebra
└── SubProject
    ├── Manifest.toml
    └── Project.toml   -> deps: BenchmarkTools
shell> julia --project --quiet

# Restrict LOAD_PATH so that we don't have access to the home environment
julia> splice!(LOAD_PATH, 1:length(LOAD_PATH)); push!(LOAD_PATH, "@")
1-element Array{String,1}:
 "@"

# BenchmarkTools is not defined as a dependency of the top-level project
julia> using BenchmarkTools
ERROR: ArgumentError: Package BenchmarkTools not found in current path:
- Run `import Pkg; Pkg.add("BenchmarkTools")` to install the BenchmarkTools package.

Stacktrace:
 [1] require(::Module, ::Symbol) at ./loading.jl:823

# However it is a dependency of the SubProject
julia> push!(LOAD_PATH, "SubProject")
2-element Array{String,1}:
 "@"         
 "SubProject"

julia> using BenchmarkTools

julia> 

Oh, my bad! I don’t watch change announcements closely enough and had missed this. Thanks!

The documentation speaks of “implictly adding the tested package itself”. Is it a feature specific to tests, or could it be replicated with other subprojects?

Really nice, thanks! I think Run.jl would indeed solve all issues of this type that I encountered.

What I like about Tamas’ idea of stacking environments is that it should allow using the dependencies of several subprojects at once. In other words, if you’re willing to lose isolated environments, you can aggregate a complete environment for all subprojects, as though all dependencies had been declared in the top project.

In this respect, I would say such a solution would combine all advantages: if you want separate environments for testing purposes you can have them; if you want the ease of use of a single environment defining all dependencies, you can have it too.

Note

https://github.com/JuliaLang/Pkg.jl/issues/1251

https://github.com/JuliaLang/Pkg.jl/issues/1233

so I think this is coming.

3 Likes

One of my primary motivations to write Run.jl is reproducibility so Run.script even exclude the default environment (@v#.#) by default. Another reason to avoid stacked environments is that it could break version compatibility. But that’s because CI needs a reliable way to run tasks. For interactive sessions, stacking environments is really great.

1 Like

Could you please elaborate a little on this point?

Consider projects like this:

  • Project1

    • PkgA 1.0 (works with PkgB 1.x)
    • PkgB 1.9
  • Project2

    • PkgC 1.0 (works with PkgB 2.x)
    • PkgB 2.0

I think launching julia with JULIA_LOAD_PATH=Project1:Project2 breaks PkgC because it would import PkgB 1.9 while it needs 2.0.

1 Like

Thanks, I see why we could not make Project1 and Project2 work together with stacked environments in this case.

Just to be sure I understand you point, what you’re saying is that since both projects can not work together, we should ensure they are run in different Julia sessions. And this is precisely what Run.jl automates. Am I right? That makes a lots of sense.

Either way, I think this should not be a problem for me, since the use cases I have in mind involve making sub-projects work together, which are all developed within the same larger project (github project), presumably by the same team.

Yes, I think the hypothetical risk I was talking about is very small especially if you track Manifest.toml in git. You can at least reproduce stacked environments at any moment if there was anything wrong this way.

Sorry - could you clarify this example?

Isn’t the entire point of Pkg to look in PkgAs Manifest when resolving its dependency on PkgB, which can be a different version compared with PkgCs dependency on PkgB (according to PkgC’s Manifest)?

So there should be no issue loading PkgA and PkgC at the same time.

Did I misunderstand your comment?

If both PkgA and PkgC depend on the same PkgB (i.e., UUID is the same), the PkgB loaded in one Julia process is shared across PkgA and PkgC. So, using PkgB in PkgC loads PkgB 2.0 even if Project2/Manifest.toml has PkgB 1.9.

If PkgB in Project1 and PkgB Project2 are different PkgB which happen to have the same name (i.e., UUIDs are different), then PkgA and PkgC can be safely loadedin one process.

Ah, yes. Thank you. I see this now from the updated code loading docs. (Though I don’t understand why this design decision was made.)

I think you can find an answer in this comparison of Julia’s Pkg to Go’s packaging system:

Go is treating packages with different major versions as essentially different packages. So the above scenario would have “worked” in the sense that PkgC would load PkgB 2.0 instead of PkgB 1.9.

Thanks - a very interesting thread.