As I use Julia for ever more complex and complete projects, I seem to run into the following problem, where a single “project” contains lots of related code, which will be used in different contexts to perform different tasks. (I’m speaking here of a “project” in the Github sense; from here on let’s imagine I’m speaking of a git repository, and reserve the words “package” and “project” to express the Julia concepts defined in the Pkg.jl documentation.)
To illustrate what I mean, consider for example a git repository defining a Julia package and containing, in addition to the (re-usable features that the package exports) some code automating various development tasks:
a test suite, implemented in test/runtests.jl (which in turn potentially includes other files) and requiring specific dependencies (in addition to those of the package code itself)
possibly some code to post-process the results of test cases (for example submit coverage results to web services such as Codecov or Coveralls). Such scripts typically depend on yet other tools.
a Documenter.jl-based documentation, in the form of markdown files accompanied by some julia code (in doc/make.jl) that actually builds and publishes the documentation. This obviously depends on Documenter.jl
some benchmarking scripts, which have yet other dependencies (like BenchmarkTools.jl or PkgBenchmark.jl)
possibly some code to post-process benchmarking results and generate plots (like a roofline model, or a plot of run time vs problem size). This code has non-trivial dependencies like Plots.jl or GR.jl, which you don’t necessarily want to install on the high-performance system where you ran the benchmarks
The point I am getting at is twofold:
all this code really belongs to the same git repository; it would not make much sense to have some of it elsewhere. Yet it can be subdivided into several relatively independent tasks, which (i) will be used by different types of users, possibly on different types of systems, (ii) are run in different ways, and (iii) require different sets of dependencies.
we currently have various ways of dealing with these tasks. Usually, a top-level Julia program, whose name follows some convention (e.g.test/runtests.jl or doc/make.jl), is used as entry point. The methods (that I know of) to deal with dependencies are variations around the following two themes:
dependencies are listed in the [extra] and [target] sections of the top-level package’s Project.toml. To my knowledge, test is the only legal target at the moment, but we could imagine having the possibility to list additional (possibly user-defined) targets (benchmark, for example). This, in addition to the known path of the entry point, allows tools like Pkg.test() to automate the process of running the task in an appropriate environment.
the set of source files related to a task is grouped into a sub-folder (like doc/ for example), which acts as a fully-fledged project, with its own Project.toml file defining dependencies. I’ve seen this pattern used not only for documentation building scripts, but also for test-coverage submission scripts, and I’m starting to use it for benchmarking as well. The only (slightly annoying) issue I’m facing with this solution is that one has to explicitly handle environment-related issues: fix LOAD_PATH to be able to use the top-level package, call Pkg.instantiate() when needed and other such things.
So here are (at last!) my questions:
are there plans to extend the [targets] mechanism for it to be usable for other purposes than testing?
is there a particular issue with having various sub-project to perform various tasks, each one listing its dependencies in its own Project.toml file?
should we keep these two technique (which solve very similar problems, at least IMHO), or should one be advertized more than the other in the documentation, so that we can perhaps more efficiently build some tooling around it?
Yes, I’m mostly speaking of packages here. Sorry if that was not clear.
Even in the case of a simple non-package project, the same problem could happen. Think for example of a research paper (like in this thread), where you might have a program to reproduce the results themselves and store them in data files, and another program to produce the plots shown in the paper.
Depending on the system where you want your program to run (for example on a supercomputer), it might not be convenient to install plotting packages there. The post-processing and plotting stage can then be run on an other system, where graphics dependencies are easier to install.
FYI, I also wanted to ease $subproject/Project.toml handling so I created a “task runner” package Run.jl (documentation). The API is that Run.script("DIRECTORY/SCRIPT.jl") automatically activate/instantiate DIRECTORY/Project.toml then run the script in an isolated environment. There are quick shortcuts Run.test() and Run.docs() but its useful for general “tasks”; e.g., running benchmarks in CI. All these sub-projects automatically Pkg.devs the parent directory so you can instantiate it even if your main project is not in the registry or does not have URL.
shell> julia --project --quiet
# Restrict LOAD_PATH so that we don't have access to the home environment
julia> splice!(LOAD_PATH, 1:length(LOAD_PATH)); push!(LOAD_PATH, "@")
# BenchmarkTools is not defined as a dependency of the top-level project
julia> using BenchmarkTools
ERROR: ArgumentError: Package BenchmarkTools not found in current path:
- Run `import Pkg; Pkg.add("BenchmarkTools")` to install the BenchmarkTools package.
 require(::Module, ::Symbol) at ./loading.jl:823
# However it is a dependency of the SubProject
julia> push!(LOAD_PATH, "SubProject")
julia> using BenchmarkTools
Oh, my bad! I don’t watch change announcements closely enough and had missed this. Thanks!
The documentation speaks of “implictly adding the tested package itself”. Is it a feature specific to tests, or could it be replicated with other subprojects?
Really nice, thanks! I think Run.jl would indeed solve all issues of this type that I encountered.
What I like about Tamas’ idea of stacking environments is that it should allow using the dependencies of several subprojects at once. In other words, if you’re willing to lose isolated environments, you can aggregate a complete environment for all subprojects, as though all dependencies had been declared in the top project.
In this respect, I would say such a solution would combine all advantages: if you want separate environments for testing purposes you can have them; if you want the ease of use of a single environment defining all dependencies, you can have it too.
One of my primary motivations to write Run.jl is reproducibility so Run.script even exclude the default environment (@v#.#) by default. Another reason to avoid stacked environments is that it could break version compatibility. But that’s because CI needs a reliable way to run tasks. For interactive sessions, stacking environments is really great.
Thanks, I see why we could not make Project1 and Project2 work together with stacked environments in this case.
Just to be sure I understand you point, what you’re saying is that since both projects can not work together, we should ensure they are run in different Julia sessions. And this is precisely what Run.jl automates. Am I right? That makes a lots of sense.
Either way, I think this should not be a problem for me, since the use cases I have in mind involve making sub-projects work together, which are all developed within the same larger project (github project), presumably by the same team.
Yes, I think the hypothetical risk I was talking about is very small especially if you track Manifest.toml in git. You can at least reproduce stacked environments at any moment if there was anything wrong this way.
If both PkgA and PkgC depend on the same PkgB (i.e., UUID is the same), the PkgB loaded in one Julia process is shared across PkgA and PkgC. So, using PkgB in PkgC loads PkgB 2.0 even if Project2/Manifest.toml has PkgB 1.9.
If PkgB in Project1 and PkgB Project2 are different PkgB which happen to have the same name (i.e., UUIDs are different), then PkgA and PkgC can be safely loadedin one process.