Pkg3 vs. Go's vgo

package-manager

#1

The other day I came across the post A Proposal for Package Versioning in Go in Go’s blog, and the following passages in particular (edited for clarity) got me wondering about whether these concerns (and corresponding proposed solutions) apply to Pkg3.

They start by describing their current approach:

[…] we all believed the answer would be to follow the package versioning approach exemplified by Rust’s Cargo, with tagged semantic versions, a manifest, a lock file, and a SAT solver to decide which versions to use. Sam Boyer led a team to create Dep, which followed this rough plan, and which we intended to serve as the model for go command integration.

…and its problems:

But as we learned more about the implications of the Cargo/Dep approach, it became clear to me that Go would benefit from changing some of the details, especially concerning backwards compatibility.

Cargo allows partial code upgrades by giving up import uniqueness: a given import path can have different meanings in different parts of a large build. Dep ensures import uniqueness by giving up partial code upgrades: all packages involved in a large build must find a single agreed-upon version of a given dependency, raising the possibility that large programs will be unbuildable.

(“partial code upgrades” and “import uniqueness” are described in the post better than I could summarize here.)

Next, they describe the proposed solution they came up with (emphasis mine):

The only way to have partial code upgrades, import uniqueness, and semantic versioning is to adopt semantic import versioning […] in which versions starting at v2.0.0 include the major version in the import path: my/thing/v2/sub/pkg.

Deciding between partial code upgrades and import uniqueness requires predicting which will hurt more to give up. Semantic import versioning lets us avoid the choice and keep both instead.

There’s also a passage related to the need for lock and manifest files (emphasis mine):

I was also surprised to discover how much import compatibility simplifies version selection, which is the problem of deciding which package versions to use for a given build.

The constraints of Cargo and Dep make version selection equivalent to solving Boolean satisfiability, meaning it can be very expensive to determine whether a valid version configuration even exists. And then there may be many valid configurations, with no clear criteria for choosing the “best” one.

Relying on import compatibility can instead [enable] a trivial, linear-time algorithm to find the single best configuration, which always exists. This algorithm, which I call minimal version selection, in turn eliminates the need for separate lock and manifest files. It replaces them with a single, short configuration file, edited directly by both developers and tools, that still supports reproducible builds.

(again, “import compatibility” is described in the post.)

Now, I’ve been only superficially following the discussions about Pkg3 and don’t know whether these issues have been solved in a different way in Pkg3, or if they even apply given Julia’s packaging ecosystem. But since I didn’t see anyone mention this post here, and it seemed to be pretty transversal rather than Go-specific, I thought I’d at least point it out. Hopefully at the very least it will lead to additional clarifications, complementing the Pkg3 plan and status thread.


#2

I’ve read the various vgo posts with interest and have mixed reactions. The described approach with “semantic import versioning” can be summarized like this:

  • Your project can depend on two different major versions of the same package at the same time.

  • However, it is impossible to express that you want to allow either of these major versions (not both) and don’t care which.

Thus, it is impossible for projects to support adjacent incompatible versions of their dependencies, which strikes me as a bit dangerous and potentially leading to upgrade deadlock. This is mitigated by allowing different major versions to coexist so that at least different portions of the code can upgrade independently. It remains to be seen what effect this has on the Go package ecosystem. Worst case, it becomes common for the ecosystem to contain miniature versions of the Python2/3 schism; best case, it’s no problem at all.

People have brought up the prospect of allowing loading multiple different versions of the same Julia package at the same time—largely because that’s what npm does. However, this approach does not mix well with multiple dispatch because each different copy of a package creates different versions of its own types and generic functions, which all the other copies don’t know about. Thus, if you pass a type from one copy of the package to the generic function of another copy of the package, everything blows up—and in the most confusing way since you have an operation that looks like it should work. I don’t think this issue is lessened much by these different versions having different major version numbers, so I think in Julia there can really only be one copy of a given package. If you really want a package rewrite to be completely independent, then rename it and give it a new UUID.

Go appears to have less of a problem with this since they do structural rather than nominal typing and as in classic single dispatch languages, there’s a shared global namespace for method names. What this means is that you can generally pass an instance of a type from one copy of a package to a method from another copy of a package and as long as they look similar enough, it will work. (To that end, I have to wonder a bit why they don’t go full npm-style here, but that’s another subject.) If two copies with different major versions are so different that this won’t work, then presumably the new version is a complete rewrite and hopefully your program won’t end up passing objects from one to the functions of the other—and presumably if you do that, you’ll get a compile-time type checking error.

To address the second point about eliminating the need for separate lock and manifest files, I’ll copy what I wrote on HN in response to Dart’s Bob Nystrom, who made what I thought was an excellent point about this kind of seemingly win-win change usually involving a “Missing Third Feature” that is left out of the analysis. He goes on to say that the inability to allow your project to allow its dependencies to span major versions is that third missing feature for him. I wrote this:

For me the missing third feature is the decoupling of version resolution from version recording that this approach gives up. Russ Cox presents vgo not needing separete manifest and lock files as a good thing, but I think that having those be separate is a crucial feature. In a system like Cargo, if I want to reproduce a past state of some software system, as long as I have a Cargo.lock file, I don’t have to know or care how those versions were chosen. They could have been chosen by a SAT solver or a soothsayer – it doesn’t matter; to reproduce that exact software state, I just install the recorded versions. In the vgo proposal, on the other hand, these are inextricably coupled: in order to reproduce a software state, I need to apply the exact same “minimal version selection” algorithm. If it turns out that minimal version selection has big issues and needs to be abandoned or even just tweaked, that means that old software configurations become irreproducible because reproducibility depends on the project requirements and the algorithm and not just an explicit lock file.

In short, I think that coupling the precise algorithm for version selection with the ability to reproduce an exact software state is a pretty big mistake. I do agree that the fact that we need to solve NP-hard problems to pick versions is troubling and have spent a lot of time thinking about how to mitigate that fact. However, it is still unclear that minimal version selection approach is the right solution, so why not allow room for experimentation in the version selection algorithms people can use? Decoupling reproducibility from version selection allows for just that.

It’s also worth pointing out that the minimal part of “minimal version selection” is not the reason why version resolution in vgo ceases to be NP-hard. The real reason is that you’re only allowed to put lower bounds on dependencies and the insistence that minor releases of Go packages must not break code. Under those assumptions, it would be even simpler to do “maximal version selection”: just install the latest version of every dependency. This would also work and is even simpler to compute. The main issue with that seems to be the tacit admission that assuming that minor releases of Go packages will not break code is unrealistic [source]:

The second algorithm is the behavior of go get -u: download and use the latest version of everything. This mode fails by using versions that are too new: if you run go get -u to download A, it will correctly update to B 1.2, but it will also update to C 1.3 and E 1.3, which aren’t what A asks for, may not have been tested, and may not work.

Of course, if the author of C or E followed the maxim of not breaking code in minor releases, then this wouldn’t be a problem. It is true that not upgrading these is safer since in the minimal approach at least some component of your system must have been tested with the versions of these packages that you get—which is better than nothing—but other parts of your system have not been tested with those versions and may still break. The fact that this is even a concern is somewhat at odds with the assumption that developers will never break compatibility in minor version upgrades.

Another possibly non-obvious consequence of the selection algorithm described is that you can only add dependencies to your packages, never remove them—at least not until you release a new major version, at which point you can change anything you want to. In particular, this means you cannot replace one dependency with a functionally different one, even if the API of your package remains completely compatible and you only change the way you implement completely internal functionality. Once you depend on something, you cannot get rid of that dependency until you make a major upgrade. This was pointed out in this HN comment:

Okay, so i’m reading the minimal version selection algorithm article [1], and there’s either a flaw, or something i don’t get. Here’s a modified version of the example graph, written as pidgin DOT [2]:

A -> B1.2
A -> C1.2
B1.2 -> D1.3
C1.2 -> D1.4
D1.3 -> X1.1
D1.4 -> Y1.1

The key thing here is that D1.3 depended on X1.1, but the new D1.4 depends on Y1.1 instead. I guess X is old and busted, and Y is the new hotness. What is the build list for A? Russ says:

The rough build list for M is also just the list of all modules reachable in the requirement graph starting at M and following arrows.

And:

Simplify the rough build list to produce the final build list, by keeping only the newest version of any listed module.

The list of all modules reachable from A is B1.2, C1.2, D1.3, D1.4, X1.1, and Y1.1, so that’s the rough build list. The list of the newest versions of each module is B1.2, C1.2, D1.4, X1.1, and Y1.1, so that’s the build list. The build list contains X1.1, even though it is not needed. Really?

[1] https://research.swtch.com/vgo-mvs
[2] https://www.graphviz.org/doc/info/lang.html

This limitation could probably be patched up by resolving a set of versions and then noticing that X1.1 is unreachable and discarding it. The trouble with that fix would be if X1.1 forces some other package to an unnecessarily high version even though it ends up being discarded, which would violate the minimality principle and could lead to using versions of dependencies that are higher than required by anything in the system.

I do think that minimal version selection is interesting and worth experimenting with—especially for users who are particularly risk averse and want to run with the most conservative, stable set of package versions they can. Fortunately, since Pkg3 decouples version selection from the rest of the system entirely, we can easily experiment with different version resolution strategies. The minimality idea is independent from the compatibility assumptions that actually limit the computational complexity of version resolution in the Go proposal, so we would still need a SAT-equivalent solver, but then we can do minimal version selection without making draconian (and possibly false) compatibility assumptions. Moreover, with a more sophisticated solver, we would be easily able to do things like chose a minimal set of minor versions but a maximal set of patch versions—after all, choosing the minimal patch version means that you’re always running the buggiest set of versions that could possible work, which doesn’t seem ideal.


#3

Thanks for the comprehensive response! I won’t pretend I was able to fully grasp all of what you said (as I mentioned, I haven’t been keeping up with Julia’s package management closely enough for that), but I’m sure this will help those more informed than me, and serve as useful future reference.


#4

Anything in particular that’s unclear that I could elaborate on? It’s also entirely possible that I’m just wrong about something or missed something, so please do ask about anything that’s confusing.


#5

I love reading this kind of behind-the-scene summary, thank you for writing it. Does version selection boil down to a constraint satisfaction problem, or is there more to it? How is the problem formally stated, what’s your objective function?


#6

Not that I can point out – I apologize for giving out that impression. I simply am not sufficiently involved with (nor knowledgeable about) package management in general and Pkg3 in particular to make insightful comments or reason about these arguments fully. FWIW, your post seemed logically consistent and well reasoned to my layman eyes, and reassured my trust in the soundness and resilience of Pkg3’s architecture.


#7

Is there a simple guide to converting packages to Pkg3?
I’ve been keeping my packages in JuliaString up to date with v0.7 master (on a daily basis pretty much!),
and I’d like to start making them usable with Pkg3 as well.
BTW, this is looking great, and I appreciate the cogent and well reasoned explanation you gave here!


#8

Package developers don’t need to do anything, the new package manager should just work with packages as they are. Once enough packages work on 0.7 we’ll start making automated PRs to add project files, but these aren’t necessary yet.


#9

The most revelatory point to me was that if you take Semantic Versioning seriously, it should not be possible to say a package depends on X without including a major version number (or range of numbers). By the definition of Semantic Versioning, the major versions of X are API incompatible. And as Stefan says, you can’t always be sure that minor updates won’t occasionally break code in spite of Semantic Versioning, so you should probably be careful about those too. I gather that the Pkg3 lock file fully specifies which version of X a package depend on, which sounds like a big win to me. The other big win of fully specifying is that it sounds like it will be automatic, whereas with python even when I have tried to specify what versions I depend on, I often have no idea except I know that the current version works.

I suspect I am not alone in regularly depending on packages without specifying any version information, and regularly being bitten by it, but not knowing there was a better way.


#10

The Manifest.toml file records the exact git tree hash of each dependency. In fact, in the most common usage, the package UUID and the git hash are how you find the code to load, so knowing exactly what code you’re running is inherently built into the code loading mechanism. You can also give an explicit path but that’s mostly only for using development versions and you can still snapshot the git tree hash of those as well.