Pkg3 vs. Go's vgo

I’ve read the various vgo posts with interest and have mixed reactions. The described approach with “semantic import versioning” can be summarized like this:

  • Your project can depend on two different major versions of the same package at the same time.

  • However, it is impossible to express that you want to allow either of these major versions (not both) and don’t care which.

Thus, it is impossible for projects to support adjacent incompatible versions of their dependencies, which strikes me as a bit dangerous and potentially leading to upgrade deadlock. This is mitigated by allowing different major versions to coexist so that at least different portions of the code can upgrade independently. It remains to be seen what effect this has on the Go package ecosystem. Worst case, it becomes common for the ecosystem to contain miniature versions of the Python2/3 schism; best case, it’s no problem at all.

People have brought up the prospect of allowing loading multiple different versions of the same Julia package at the same time—largely because that’s what npm does. However, this approach does not mix well with multiple dispatch because each different copy of a package creates different versions of its own types and generic functions, which all the other copies don’t know about. Thus, if you pass a type from one copy of the package to the generic function of another copy of the package, everything blows up—and in the most confusing way since you have an operation that looks like it should work. I don’t think this issue is lessened much by these different versions having different major version numbers, so I think in Julia there can really only be one copy of a given package. If you really want a package rewrite to be completely independent, then rename it and give it a new UUID.

Go appears to have less of a problem with this since they do structural rather than nominal typing and as in classic single dispatch languages, there’s a shared global namespace for method names. What this means is that you can generally pass an instance of a type from one copy of a package to a method from another copy of a package and as long as they look similar enough, it will work. (To that end, I have to wonder a bit why they don’t go full npm-style here, but that’s another subject.) If two copies with different major versions are so different that this won’t work, then presumably the new version is a complete rewrite and hopefully your program won’t end up passing objects from one to the functions of the other—and presumably if you do that, you’ll get a compile-time type checking error.

To address the second point about eliminating the need for separate lock and manifest files, I’ll copy what I wrote on HN in response to Dart’s Bob Nystrom, who made what I thought was an excellent point about this kind of seemingly win-win change usually involving a “Missing Third Feature” that is left out of the analysis. He goes on to say that the inability to allow your project to allow its dependencies to span major versions is that third missing feature for him. I wrote this:

For me the missing third feature is the decoupling of version resolution from version recording that this approach gives up. Russ Cox presents vgo not needing separete manifest and lock files as a good thing, but I think that having those be separate is a crucial feature. In a system like Cargo, if I want to reproduce a past state of some software system, as long as I have a Cargo.lock file, I don’t have to know or care how those versions were chosen. They could have been chosen by a SAT solver or a soothsayer – it doesn’t matter; to reproduce that exact software state, I just install the recorded versions. In the vgo proposal, on the other hand, these are inextricably coupled: in order to reproduce a software state, I need to apply the exact same “minimal version selection” algorithm. If it turns out that minimal version selection has big issues and needs to be abandoned or even just tweaked, that means that old software configurations become irreproducible because reproducibility depends on the project requirements and the algorithm and not just an explicit lock file.

In short, I think that coupling the precise algorithm for version selection with the ability to reproduce an exact software state is a pretty big mistake. I do agree that the fact that we need to solve NP-hard problems to pick versions is troubling and have spent a lot of time thinking about how to mitigate that fact. However, it is still unclear that minimal version selection approach is the right solution, so why not allow room for experimentation in the version selection algorithms people can use? Decoupling reproducibility from version selection allows for just that.

It’s also worth pointing out that the minimal part of “minimal version selection” is not the reason why version resolution in vgo ceases to be NP-hard. The real reason is that you’re only allowed to put lower bounds on dependencies and the insistence that minor releases of Go packages must not break code. Under those assumptions, it would be even simpler to do “maximal version selection”: just install the latest version of every dependency. This would also work and is even simpler to compute. The main issue with that seems to be the tacit admission that assuming that minor releases of Go packages will not break code is unrealistic [source]:

The second algorithm is the behavior of go get -u: download and use the latest version of everything. This mode fails by using versions that are too new: if you run go get -u to download A, it will correctly update to B 1.2, but it will also update to C 1.3 and E 1.3, which aren’t what A asks for, may not have been tested, and may not work.

Of course, if the author of C or E followed the maxim of not breaking code in minor releases, then this wouldn’t be a problem. It is true that not upgrading these is safer since in the minimal approach at least some component of your system must have been tested with the versions of these packages that you get—which is better than nothing—but other parts of your system have not been tested with those versions and may still break. The fact that this is even a concern is somewhat at odds with the assumption that developers will never break compatibility in minor version upgrades.

Another possibly non-obvious consequence of the selection algorithm described is that you can only add dependencies to your packages, never remove them—at least not until you release a new major version, at which point you can change anything you want to. In particular, this means you cannot replace one dependency with a functionally different one, even if the API of your package remains completely compatible and you only change the way you implement completely internal functionality. Once you depend on something, you cannot get rid of that dependency until you make a major upgrade. This was pointed out in this HN comment:

Okay, so i’m reading the minimal version selection algorithm article [1], and there’s either a flaw, or something i don’t get. Here’s a modified version of the example graph, written as pidgin DOT [2]:

A -> B1.2
A -> C1.2
B1.2 -> D1.3
C1.2 -> D1.4
D1.3 -> X1.1
D1.4 -> Y1.1

The key thing here is that D1.3 depended on X1.1, but the new D1.4 depends on Y1.1 instead. I guess X is old and busted, and Y is the new hotness. What is the build list for A? Russ says:

The rough build list for M is also just the list of all modules reachable in the requirement graph starting at M and following arrows.

And:

Simplify the rough build list to produce the final build list, by keeping only the newest version of any listed module.

The list of all modules reachable from A is B1.2, C1.2, D1.3, D1.4, X1.1, and Y1.1, so that’s the rough build list. The list of the newest versions of each module is B1.2, C1.2, D1.4, X1.1, and Y1.1, so that’s the build list. The build list contains X1.1, even though it is not needed. Really?

[1] research!rsc: Minimal Version Selection (Go & Versioning, Part 4)
[2] DOT Language | Graphviz

This limitation could probably be patched up by resolving a set of versions and then noticing that X1.1 is unreachable and discarding it. The trouble with that fix would be if X1.1 forces some other package to an unnecessarily high version even though it ends up being discarded, which would violate the minimality principle and could lead to using versions of dependencies that are higher than required by anything in the system.

I do think that minimal version selection is interesting and worth experimenting with—especially for users who are particularly risk averse and want to run with the most conservative, stable set of package versions they can. Fortunately, since Pkg3 decouples version selection from the rest of the system entirely, we can easily experiment with different version resolution strategies. The minimality idea is independent from the compatibility assumptions that actually limit the computational complexity of version resolution in the Go proposal, so we would still need a SAT-equivalent solver, but then we can do minimal version selection without making draconian (and possibly false) compatibility assumptions. Moreover, with a more sophisticated solver, we would be easily able to do things like chose a minimal set of minor versions but a maximal set of patch versions—after all, choosing the minimal patch version means that you’re always running the buggiest set of versions that could possibly work, which doesn’t seem ideal.

39 Likes