Say I have a package “MyPkg.jl” with the following structure:
MyPkg.jl/Manifest.toml
MyPkg.jl/Project.toml
MyPkg.jl/src/MyPkg.jl
MyPkg.jl/benchmark/benchmark.jl
where benchmark.jl
is a script that produces benchmark statistics for MyPkg.jl
and has additional dependancies to MyPkg.jl
.
What is the best way to handle the dependancies of benchmark.jl
?
Ideally I’d like to simply add [extras]
with [targets]
that point to benchmark
but this looks to only be an option for targets test
and build
3297
1 Like
gdalle
July 13, 2023, 5:35am
2
What I do for docs
and benchmarks
is create a dedicated Project.toml
at the root of the subfolder, in which I pkg> dev
the main package. The case of testing is a bit peculiar, and I prefer to use extras
for that (see here why, although that might be fixed soon).
In the long run, you probably want to follow this GitHub issue, which describes exactly what we both seem to dream of:
opened 10:53AM - 19 Jun 19 UTC
enhancement
feature
I have had an idea for a while which I call "sub-projects" and we discussed it o… n the pkg-dev call yesterday. The post here is to summarize that discussion.
1. What problem does sub-projects try to solve?
There are cases where we use multiple Project.toml in a package. One common such scenario is for documentation where there is a Project typically containing Documenter.jl and the package (which has a relative path in the Manifest.toml). The documentation Manifest.toml file contains the full resolved state (independent of the "main Manifest.toml). The problem is that, with time, it is very likely that the version of dependencies in the docs Manifest.toml drift away from the version of dependencies in the main Manifest.toml. This is likely not desired since things like doctest might pass with the docs Manifest.toml but might not pass when the main Manifest.toml is used.
The same applies to having a test specific Project / Manifest but there it is arguably even worse because now you are not sure that the tests that are run are representative for running the package with the main Project.toml active. What we want here is to be able to use the main Manifest.toml, but give some extra additional dependencies that are only used for docs / testing.
Another problem area is shown by looking at the "model-zoo" for machine learning models in Julia: https://github.com/FluxML/model-zoo. Each model has a separate Project / Manifest and to run a model you set that as the active project and then include the model. The problem here is that each model can potentially use vastly different versions of packages that all models have in common (Flux, NNLib, CuArrays etc). Actually using code from the model zoo then becomes very annoying since it is hard to get your own project in the same state as the models run in. What we want here is to be able to give a set of packages at a fixed version (Flix, NNlib, CuArrays etc), have all models run on those versions, but also have each model add some extra dependencies because it needs to do something special (e.g. read image files).
So to summarize, the core issue is that there is no way to "incrementally" add a chunk of a dependency graph to an existing project. If you want to add extra dependencies in a scenario, you need a full copy of Project.toml / Manifest.toml and this will eventually lead to divergence between versions in that Manifest and the main-project.
2. What is a sub-project?
A sub-project is in essence an incremental addition of packages to an already existing "main-project". There needs to be some way to identify the main-project from the sub-project, and right now, the details for how this is done is not important but we could envision a `main-project = ".."` entry into the sub-project `Project.toml` to give a relative path to the main-project which is here one directory above.
The core property of a sub-project is that when you resolve it, *versions for dependencies in the manifest for the main-project are fixed*. In other words, the resolved state of a sub-project is only an incremental addition to the existing dependency graph that is set by the main-project. That means that the compat info in a sub-project *must* be consistent (resolvable) with the existing versions in the main project.
This would allow us to have a test or documentation project which simply is a sub-project to the main project. Since the version of the dependencies are forced to be the same in the sub-project we know (modulo type piracy and similar issues) that the tests we run in these sub-projects will work with the manifest in the main-project.
3. Implementation questions:
1. How should sub-projects be identified?
Firstly, it is desirable to be able to see that a project is a sub-project "locally" (i.e. by only looking at the directory of the sub-project).
Thus, we want to have some information in the sub-project to show that it, in fact, is a subproject. One proposal is to have a `main-project = $path_to_main_project` entry in the Project.toml.
Relevant for point 3.2 is also if the main project should have some mapping to sub-projects. It feels annoying to have to specify both `main-project` in the sub-project and a list of `sub-projects` in the main-project so preferable that can be avoided.
2. What should resolve do in a main-project in the presence of sub-projects?
If we re-resolve the main-project (upon e.g. an `update`), the main Manifest will change. The sub-manifests are now "out of sync" with the main-manifest, so they are potentially in a non-resolvable state. This is bad. One possible solution to this is that, if any sub-projects exist, resolving the main-project also resolves and updates all sub-projects. If any of these resolves fail, the resolve it rejected. That would keep all sub-manifests in sync at all times.
3. What changes are needed to code-loading?
Sub-projects are different from main-projects in that they only specify additional dependencies outside the main-project. This has some problems when it comes to the current implementation of code-loading in Julia.
As an example, in the case where package `A` is in the main project and package `B` depending on `A` is in the sub-project, activating the subproject and loading `B` will error because we cannot find `A` in the current project. Code loading needs to know that it should look in the main-project for the UUID to `A`.
A related issue is what should go into the sub-manifest.
There are two choices. Either the full Manifest is stored or only the addition of dependencies that comes from the sub-project is stored.
In isolation, the latter choice is clearly preferable since it doesn't repeat any redundant information. This might however mean that we need to slightly complicate the code loading to also deal with partial manifests. Since it seems we might need to touch code loading anyway, I think only storing the extra info is the way to go.
2 Likes
On the same page as mentioned by Guillaume, you might also want to have a look at Run.jl
Reproducible runs not only for test but also for any sub-projects (docs, benchmarks, etc.)
It also simplifies CI
2 Likes
I’m still not 100% there:
[1] I created a ./benchmark/Project.toml
with the corresponding dependancies.
[2] All benchmark
dependancies are included as [extras]
in the main package Project.toml
.
[3] I then activate ./benchmark/Project.toml
[4] I then pkg> dev
the main package
However this last step creates ./benchmark/Manifest.toml
which I don’t think should be there. Is this typical? Do I simply apply a gitignore
to ./benchmark/Manifest.toml
when syncing with the repository?
gdalle
July 13, 2023, 5:51pm
5
For the docs and benchmarks you usually want the manifest to be there, so that the whole thing is fully reproducible. Don’t put it in the gitignore, just let it be
1 Like