Pkg3: Why record transitive dependencies in Manifest.toml?

pkg

#1

Maybe there is a totally obvious reason for this and I am missing it — but isn’t a package’s dependency information already declared in its Project.toml and thus can be derived by any application project that uses it? If so, why does Pkg3 explicitly record them in Manifest.toml for an application project?

For example, for the following entry in the Manifest.toml in my default environment:

[[BenchmarkTools]]
deps = ["JSON", "Pkg", "Printf", "Statistics", "Test"]
git-tree-sha1 = "e686f1754227e4748259f400839b83a1e8773e02"
uuid = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
version = "0.4.1"

It seems to me that deps = ["JSON", "Pkg", "Printf", "Statistics", "Test"] should be redundant? My current understanding is that these information are made available so that Pkg3 can pull all the packages declared in the Manifest.toml in one go instead of going back and forth querying GitHub or some other registry, making resolving dependencies faster.

Navigating to BenchmarkTools.jl's GitHub repository I can see that its v0.4.1 tag doesn’t have a Project.toml file but has a REQUIRE file with the following content:

julia 0.7
JSON

I was quite surprised to see just JSON is declared because I was expecting the package to somehow specify the exact versions of its dependencies (I remember reading somewhere on this forum that Pkg3 will try to use the latest version of an package that satisfies all the version constraints, not sure if this is relevant though).

I did eventually found the required version ranges for those dependencies on Julia’s general registry, but I still don’t quite understand how Pkg3 resolves dependencies; if all the information is recorded on the registry, then what are the scenarios where Manifest.toml becomes useful? The documentation did mention it can be used to recreate a package environment; but wouldn’t that be possible by just looking up the dependencies’ uuids on the registry and reproduce the environment using the information retrieved there?

Thanks!


#2

The manifest file records the full dependency graph and exact version of every dependency of an application. The manifest file is also used at code loading time to figure out what code to load when import or using statements are encountered. You don’t want to have to go to GitHub to figure out what code to load when your program is loading BenchmarkTools and sees import JSON. Also, a dependency can only load dependencies that are declared for it in the manifest—if it tries to load anything else, it will not be able to load it.

Dependency information could, in principle, be found in project files of the individual dependencies since the tree hash is recorded and that includes a project file which declares deps. However, it’s not required that dependencies have project files—which was necessary to allow transitioning from the old to new package systems smoothly—and in general only the manifest file is consulted when loading code, not individual project files. This does cause some problems at times since the deps maps in a manifest can become out of sync with dev’d dependencies, but on the whole it seems better to have a single file that records the entire dependency graph. Perhaps at some point in the future when it is required that all dependencies have project files, we could switch to looking up dependencies in the project files. That does make it significantly harder, however, to determine if the dependency graph has dangling edges or anything like that.


#3

Thanks @StefanKarpinski for your explanation! Just two more questions regarding the role of Manifest.toml:

  1. You don’t want to have to go to GitHub to figure out what code to load when your program is loading BenchmarkTools and sees import JSON.

    This point you made I can understand; so in this case the deps field serves as some sort of cache, I guess?

  2. Also, a dependency can only load dependencies that are declared for it in the manifest—if it tries to load anything else, it will not be able to load it.

    I wonder when would a application project prohibit one of its dependencies from loading a transitive dependency? For example, when would one remove JSON from the deps field declared by BenchmarkTools if BenchmarkTools requires it?

I am also a bit confused about how Julia locate, install, and manage packages; but I think I will leave them for another post.


#4

I guess you can look at it that way but it doesn’t really seem like a cache to me. It’s more definitive than that: a package can’t load what’s not in its deps in the current manifest.

You wouldn’t want to since it would break. The point is to have an explicit record of everything that BenchmarkTools uses and know that it cannot load anything that is hasn’t explicitly declared as a dependency.

Have you read the code loading docs?

https://docs.julialang.org/en/v0.7.0/manual/code-loading/