What is the Pkg.jl package dependency resolver (solver algorithm)?

According to its source code, Pkg.jl uses a dependency resolving algorithm called MaxSum to resolve the package dependency graph. However, I was not able to find any documentation about this algorithm, or any concrete reference to some scientific paper describing it. Can someone add or provide some documentation on this algorithm and maybe some discussion what are the rationales and design decisions behind it? What are the advantages of MaxSum compared to other dependency resolvers, such as PubGrub or SAT solvers? How does MaxSum’s performance relate to the performance of e.g. PubGrub or libsolv?

4 Likes

@carlobaldassi is the only one that can properly answer this I think.

Yes, I’ve always wanted to write down some detailed notes about this but never got around to doing it. I’ll try to sketch some basics here. Please forgive the unpolished result…

MaxSum is a limiting case of a more well-known message-passing algorithm for statistical inference called Belief Propagation, also known as SumProduct. In order to derive this limiting case, consider writing a probability p(x) as \exp(\beta \phi(x)) and sending \beta \to \infty. The new quantities \phi can be interpreted as “negative energies” in a physical analogy, or “negative costs”/“utilities” in an optimization context. After the limit the Belief Propagation (BP) equations for the probabilities become the MaxSum equations for the log-probabilities-divided-by-\beta, i.e. for the costs. The BP equations (that you can look up on Wikipedia or many other places) change in 3 ways:

  • sums are replaced by maximums
  • products are replaced by sums
  • normalization is performed by shifting (so, by an additive term) rather than by scaling (a multiplicative term)

You can see then that SumProduct becomes MaxSum.

The MaxSum algorithm is not used for statistical inference, but rather for optimization. The BP equations are called iteratively until a fixed point is reached, and from the fixed point one can obtain a parameterization of a probability distribution. In MaxSum, the equations are also called iteratively, but if convergence is achieved the resulting quantities describe the mode of the probability distribution (or the modes if there is more than one, but one normally tries to ensure that a single mode is present, if necessary by breaking ties arbitrarily by adding some tiny random noise to the scores).

Just like BP, MaxSum is only exact when the graph topology on which it runs is a tree. In that case, MaxSum is basically a form of Dynamic Programming. However, BP and MaxSum can also be run over graphs with loops. In that case they are approximate, but if the loops are not “too structured” the approximation may still be excellent, and the algorithm fast (see below). In that case, however, neither BP nor MaxSum are guaranteed to reach convergence. There are heuristic techniques to work around this and drive the algorithm to produce a configuration. The one used in Julia’s resolver is called “decimation”, and it works like this. Perform a number of MaxSum iteration, then pick a few variables of the graph, those which are most “polarized” (those that exhibit the largest difference between their favorite state and their second-most-favorite state) and fix their values. This effectively simplifies the graph (and generally breaks some loops). Then start over on the simplified graph, until all the variables are fixed or a contradiction is detected. A contradiction is manifested in the algorithm when a hard constraint is being violated.

In the Julia’s resolver, this is augmented by a back-tracking mechanism: if a contradiction is detected, we try to revert the last decimation step and pick a different choice for the variables to decimate, and their values.

The representation of the problem in Julia’s solver is as follows: each package is a node of the graph, and it has as many possible states as there are package versions, plus there is one special state which is “uninstalled”. Each pair of packages may be related by dependency constraints: this is just a table T_{ij} with n_i \times n_j entries (where n_i and n_j are the number of states of nodes i and j), and T_{ij}[v_i,v_j]=1 if version v_i of package i is compatible with version v_j of package j, and 0 otherwise. Furthermore, constraints about required packages are implemented as “external fields”. These are a priori “utilities”, and for example we may say that the utility of the “uninstalled” state of a required package is -\infty.

On top of this, there are preferences for some versions of a package compared to others. In particular, we prefer the uninstalled state, if available, and higher versions over lower ones. This is also implemented via the same “external field” mechanism. In order to look at major versions first, then minor versions, then patch, we define a new numeric type. Another level of preference (e.g. giving precedence to maximizing the versions of requires packages instead of non-required ones) is implemented with yet another numeric type.

Finally, the result of MaxSum may be suboptimal. We thus push the solution towards a local optimum via a local greedy search (basically, a hill-climbing algorithm).

As for why using MaxSum may be a good idea, here’s the reasoning. First, message-passing algorithms scale nicely with the size of the graph. A message update is O(n k v^2) where N is the number of nodes (packages), k is the mean number of nodes neighbors (how many other packages are involved in dependency constraints for any given package), and v is the maximum number of versions of a package (this last one is pessimistic, but it boils down to the size of the tables T_{ij} mentioned above, and we can trim it down by version pruning). The number of message updates before decimation and the fraction of variables to decimate (fix) at each step can be set manually and so the number of steps is also manageable. The backtracking unfortunately may make the algorithm exponential, however this hasn’t been a problem in practice so far (as far as I’m aware) and if it becomes one it’s not difficult to put a hard constraint on that. So one strong motivation for this type of algorithm is its scaling.

Another motivation is that is pretty natural to: 1) work on a straightforward representation of the optimization problem, without converting it into a boolean problem of some kind; 2) implement in the same scheme both hard constraints (required packages, dependency relations) and soft preferences (higher versions are better, required packages have precedence, etc).

Finally, heuristic experience shows that indeed these message-passing schemes can work very well in a variety of NP-hard optimization problems, and examples abound there. A famous one is Affinity Propagation, which is just MaxSum applied to the k-medoids clustering problem. Also, this one is MaxSum applied to the Steiner Tree problem. This one is an application to neural networks with discrete weights. For SAT problem, a related algorithm works very well on random instances, and can be augmented with backtracking. Etc.

37 Likes

to answer the question about why we don’t use full SAT solvers, the answer is that the number of packages installed is much smaller than the number of total packages (100 vs 6000 for typical users), so algorithms that scale super-linearly with number of total packages would likely be unacceptably slow.

3 Likes

I’ve actually written a new resolver algorithm that’s implemented here. Unlike the current algorithm (or SAT solvers), which only finds one solution, it finds all Pareto-optimal solutions. If no complete solution is possible it will try to find incomplete solutions that satisfy some of the requirements (but collectively cover all of them); if that’s not possible either, it will find solutions with some incompatibilities. It has very thorough tests for small test cases already and I’ve started to test it out on real registry data, but have gotten a little stuck on that due to lack of time. Amusingly, loading the registry data efficiently is currently the bottleneck and solving seems to be quite fast—the algorithm is potentially exponential (as any NP-hard problem is believed to be), but seems to be fast in practice.

Why is it useful to find all Pareto-optimal solutions or near-solutions? If there’s a unique optimal solution then it doesn’t matter—you just use that one. In the case where multiple solutions exist, then for any pair of solutions there must be conflicts between some of their package versions. Let’s call the conflicted packages A and B. We can ask users whether they want a solution that prioritizes a newer version of A at the expense of an older version of B or vice versa. They might not care but they might have a reason to prefer one or the other. When there are no solutions, we can present the problem more clearly to users by giving them options for how to solve the problem: they can either remove requirements or fix conflicts between some of the packages their project depends on. I believe this “action oriented” approach to presenting resolver conflicts will be easier for users to understand.

28 Likes

Cool! There could be exponentially many optimal solutions - does that just not happen in practice?

2 Likes

There are exponentially many non-optimal solutions but the number of optimal solutions is linear in the number of conflicts on the optimality boundary, which is not that big in practice.

2 Likes

Update on this since it’s still accumulating likes: that repo still has the new resolver which now fully works and has pretty extensive tests. I’ve also tested it at scale and it can resolve at the scale of the entire general registry. I realized at some point, however, that I did actually need a SAT solver, so now it provides its own very simple, low-level wrapper to PicoSAT and uses that to find an adjustable number of Pareto-optimal solutions in a user-defined lexicographical ordering. This was significantly trickier than anticipated and uses features of PicoSAT that most SAT solvers don’t offer like the ability to push and pop additional sets of clauses. I haven’t had a chance to replace the old resolver yet, but I’ll get to that at some point hopefully within the year.

18 Likes

conda is turning into sat too with rattler (rust based) GitHub - conda/rattler: Rust crates to work with the Conda ecosystem.

and resolvo (rather from mamba; rust lang; bsg lic) GitHub - mamba-org/resolvo: Fast package resolver written in Rust (CDCL based SAT solving)

a lot of sat solver things has been ported to rust from the foundational libsolv (resolvo itself even w some experimental backport targeting rpm/yum now) ; even dart lang solver

pixi try to be computational node/language agnostics GitHub - prefix-dev/pixi: Package management made easy
a la github packages (without micro$oft gh server behind at some point)

if one wants to be able to use some external sat solver
there is some interfacing there that can plug both minisat and z3 ocaml-sat-solvers/src at master · tcsprojects/ocaml-sat-solvers · GitHub
z3 can be used either incremental or at-once Control whether to use incremental or "bulk" SAT solver · Issue #248 · Z3Prover/z3 · GitHub

thanks for your work

1 Like

I did look at using z3 but it’s a pretty heavy dependency and seems like overkill. PicoSAT is very lightweight and works great for this. Getting it to do optimization instead of just satisfaction is a little non obvious but doable—that’s what the temporary clause stuff is for. This might be why people are reaching for z3 instead of plain sat solvers because it’s not immediately clear how to get a sat solver to give optimal solutions rather that just possible solutions but it can be done by iteratively adding constraints until the problem is no longer satisfiable at which point the last feasible solution is an optimal one. If you do it right you can even control what order you find optimal solutions in.

5 Likes

The first and current solver, while certainly coded by someone clever and involved, was very non classical IMO and hard to grasp. It’s already a good thing that sat pops up seriously in julia. Z3 is a whole platform, rather than a turn-key maybe.

The scope of sat is very broad :

  • pkg management, but also

  • type inference in julia (see julia suptyping, nardelli 2018) .

  • interface and their implementation.
    while lacking in julia, swift has protocol , rust has trait that provides some good solutions. in a way very similar to Interfaces · The Julia Language
    but as a whole open algebraic structure (holy trait with arrow type, (sub)setting, etc.). Every spec is finally lowered to some logic IR around some implemented asserted fact
    Lowering Rust traits to logic · baby steps
    Lowering Rust IR to logic

  • many more

swift seems to use a very common LazyResolver to handle many of those cases. see Code search results · GitHub
A well defined logical solver in julia can help to bring significative progress there in the future


For pkg mgt. IIUIC. it is a multicriteria optimization where spotting a unique and unconditional scoring can be hard too. CAML OPAM even allows the user to specify preference, a set of which criteria to use and how to optimize them. for example

-count(removed),-count(down),-sum(solution,installedsize),-notuptodate(solution),-count(changed)

see opam - External solvers It can be useful to adapt to some env (server, embedded, web, notebook, local server, whatever) one needs to reach.