Decentralized Package Manager

Hi everyone :wave:

TL;DR

Propose to explicitly specify the registry where a package is located in the Project.toml and, by extension, have it appear in the Manifest.toml.

The default registry would be General and would not need to be explicitly recorded.

Longer Version

The more I learn about Pkg and it’s design, the more I like it. It’s beautiful :raised_hands:

I’ve spent the last couple days looking through the code to try to implement an idea I’ve had for some time and recently raised again here

https://github.com/JuliaComputing/Registrator.jl/issues/23

leading to this POC PR

https://github.com/JuliaLang/Pkg.jl/pull/1064

The idea is similar to npm scoping, i.e. I want to be able to explicity specify the registry when I add a package, e.g.

pkg> add @MyRegistry/MyPackage

or

julia> Pkg.add("MyRegistry","MyPackage")

My first thought (before I knew anything about how Pkg worked) was to have the ability to register registries in a registry (say that 5 times fast!) and the @ would provide a way to specify explicitly which registry contains the desired package. Then there could be some kind of “forwarding” among registries.

I still think it is a good idea to be able to register registries with a registry, but not for the purpose of forwarding add requests. Rather, now I see how one registry may have an indirect dependency on another registry.

For example, I just created a registry for JuliaFinance. I am still experimenting so there is only one package, Currencies.jl in the registry, but this package is also registered in the General registry. I am consdering to de-register Currencies.jl from the General registry and just use JuliaFinance registry instead (unless peers convince me otherwise, which is certainly possible :blush:) .

Now, let’s say AnotherPackage.jl is registered with the General registry, but AnotherPackage.jl has a dependency on Currencies.jl from the JuliaFinance registry.

As long as the user has previously manually added

pkg> registry add https://github.com/JuliaFinance/JuliaFinance

everything should work, but if Julia cannot find the JuliaFinance registry, it will fail to find Currencies.jl.

So what I am thinking is to elevate registries to be first-class citizens on par with packages.

So if AnotherPackage.jl in the General registry has a dependency on Currencies.jl from the JuliaFinance registry, then AnotherPackage.jl implicitly has a dependency on the JuliaFinance registry as well. I think we should make that implicit dependency explicit in the Project.toml and, by extension, in the Manifest.toml.

If everything can be found in the default General registry, there is no change to either Project.toml or Manifest.toml.

One thing I like about this proposal A LOT is that it diminishes the importance of General registry and every person and every organization can have their own registry. A registry just becomes analogous to “scopes” in npm, but in a decentralized manner. People should not be shy about creating registries any more than packages.

The way things are set up currently, as I understand it, there is an implicit centralized assumption that there should be a “root”, but I don’t see it that way. Pkg was designed well enough from the beginning that, with some work, it can become a truly decentralized package manager.

I’m happy to put some elbow grease into this, but it is not a small amount of work so wanted to share my thoughts before diving in.

Edit: I previously linked to an unrelated issue by mistake. Deleted.

Any thoughts?

2 Likes

Maybe as an alternative to diving in and modifying Pkg.jl, maybe I could create a separate package. Something like PkgScoping.jl and experiment there. I will give that some thought :thinking:

This is already true. “General” is not special in any way with the exception that it is downloaded by default if no other registry is available.

This is already true. “General” is not special in any way with the exception that it is downloaded by default if no other registry is available.

I would agree with this statement IFF General allowed packages that have dependencies on packages located in other registries (and Pkg had an automated way to make it work). As I understand it, and I hope I’m wrong, General has no intention of allowing this, so General is special in that it only allows packages that play in its playground. This is what I mean by centralized vs decentralized.

As I said in Slack, most of what I want is already there in Pkg. I’m not proposing a massive rewrite. This change is fairly small, but would take me a few days if I did it myself so am looking for feedback before I dive in.

I do think this apparent policy stance by General to not allow packages with dependencies in other registries should be revisited though.

1 Like

Some more details:

Places besides this thread that related discussion are now going on:

Let’s try to keep the discussion here rather than in any of those other places for the sake of having a somewhat coherent conversation.

1 Like

Yeah :pensive:

If there is interest, we can restrict the discussion to this topic on Discourse or the latest Issue #1067.

My fault.

There’s a lot in all of these issues so I’m going to address it in parts.

Namespaces

As was pointed out in the discussion on [POC] Add scoping, e.g. @MyRegistry/MyPackage by EricForgy · Pull Request #1064 · JuliaLang/Pkg.jl · GitHub, Pkg already allows different packages to have the same name—they don’t even need to be in different registries. In fact, it’s allowed for different registries to register different versions of the same package, e.g. if a company wants to have a private hotfix version of a public package, that’s allowed.

So the motivation for the namespace feature is pretty different than it would be in other systems that don’t support package name collisions. There are pretty much two reasons you might want this:

1. So that you can write pkg> add @SomeOrg/Foo to add SomeOrg’s Foo package instead of being prompted to choose SomeOrg’s Foo package as you currently are.

2. So that you can use two different Foo packages in the same project even though they have the same name, e.g. you could do something like this:

import @SomeOrg/Foo
import @OtherOrg/Foo as Bar

I’m not necessarily proposing that as actual syntax but we would probably need some kind of syntax for it before the second application could be done. I don’t think we should add any functionality to the package manager for a feature that we don’t support in the language yet, so I think this should probably be allowed in the language first and then supported by the package manager.

As to the first thing, I’m not sure that the complication of a namespacing mechanism is even warranted. It seems like a lot of complication to add to the system to get no additional functionality and just avoid an interactive prompt.

Also note that identifying namespaces with registries seems conceptually wrong. They are orthogonal concepts since multiple packages by the same name can appear in a single registry and the same package can appear in multiple registries. If we had namespaces, you would want to be able to use them to select between different packages with the same name within a given registry. On the flip side, if you specified a registry as a namespace and it identified a package in that registry uniquely, would you want to ignore any versions of that package available in other registries? I would not think so.

Finally, what happens if it’s decided that a package should move from one namespace to another? If people are selecting it by UUID, that’s permanent and the instructions will never be out of date—all that’s necessary is finding a registry that knows about that UUID and presumably once something is in the General registry, there’s no reason to forget about it. Even if it becomes old one could at least have an Attic registry where one knows to look for older stuff.

A related observation is that it’s unclear that a namespacing mechanism should be mutually exclusive like folders or allow the same package to be tagged with various “namespaces” like labels. When something moves between folders, if the folder is how you found it then that stops working. If you decide a new label is better, you can always just add a new label and have both labels for a while or forever.

3 Likes

While I initially liked the idea as proposed in the OP, I think that it’s a bad solution to the root of the actual problem, that problem being the ease of discovering other registries and the packages they contain. However, I think we can solve the same problem in a different way that doesn’t cause any modifications to Pkg (or Registrator), and also provides the opportunity to do lots of other cool things with Pkg without having to modify Pkg itself.

Briefly, here are my problems with the approach proposed in the OP:

  1. Requires modifying Pkg and possibly Julia to support this new functionality.
  2. Actually reduces the benefits of federation of packages via registries, because now instead of your package being spread and duplicated (safely) across multiple disjoint registries, you’re instead explicitly linking together registries and creating hard dependencies on the packages that they contain, that do not need to exist.
  3. Requires the curators of General to accept your proposed registry when deciding whether to accept your package into their registry; if your linked registry has a bad reputation or distributes questionable packages, that potentially compromises the policies of General by forcing a link to your potentially shady registry.
    EDIT: Added:
  4. Gives preference to your preferred registry for using your package, when in fact I may disagree with the choices you make with your registry and instead prefer to use my own registry to provide access to the same package.

My (abstract) proposal is the following: we create a tool, let’s call it PkgHub.jl, with a cute little binary called pkghub. This tool does a few things:

  1. Contains, and maintains, a list of known registries, to include their name, URL, and any other significant metadata. These registries are not necessarily downloaded onto the user’s system; they are just metadata to allow downloading of a registry if the user so desires.
  2. Maintains the “active” list of registries. This list is a subset of the registries in 1., and is somehow automatically integrated with your current julia install(s) so that they can be easily and automatically made available to julia/Pkg whenever you run a script. Registries can be added/removed to/from this list with the pkghub binary, just like we do with packages (and even registries) in the julia Pkg REPL mode.
  3. Provides a means to export/import a list of registries in a common format (probably a custom TOML file, to be consistent and modern). A single pkghub invocation would create/import a file like “MyRegistries.toml”, which can then be shared with others through whatever means you desire to allow them to gain knowledge of the registries you have on your system. Of course, because you might have private registries that you don’t want others to know about/access, there could probably be a concept of a “public/private” flag that can be applied to each of your known registries to determine which registries are automatically exported.
  4. As a bonus, do a bunch of automated management of depots and load paths and env. variables and all that good stuff if desired, because why not?

The reasons I like the above approach:

  1. No modifications required to Pkg/Registrator/Julia. This just re-uses the already existing infrastructure to make things easy.
  2. Doesn’t need to be installed with Julia or Pkg by default (such as in binary releases), and its development can be further decoupled so that it can move at whatever pace it desires.
  3. Changes active registry/depot management from being a mess of modification of environment variables, to being nicely packaged in a reliable and oft-tested tool, that is kept up to date with Pkg and Julia changes. Also potentially makes CI of complicated packages much easier (because doing Pkg commands from the CI shell is somewhat annoying and error-prone, IMO).
  4. Doesn’t enforce that a package exist in one registry only; instead, this tool doesn’t care about packages at all, and only worries about what registries are active (but of course could allow you to search known registries for packages of interest).

Cross-registry dependencies

As noted in [RFC] Accommodate dependencies from other registries · Issue #1067 · JuliaLang/Pkg.jl · GitHub, this already works. You can have a dependency graph spread across multiple registries and as long as the user has them all installed, everything just works. So the only question is what, if anything, to do about the situation where a user does not have all the registries that might be needed installed.

The issue and this proposal supposes that a federation of loosely connected, incomplete registries is a desirable end-goal. I do not share that view. I think that the General registry should be self-contained and fairly complete. Why force the user to clone and update dozens of loosely connected registries? If resolution takes place across many registries, how can one even hope to make sure that they are in a coherent state where resolution produces sane, working results? You could include a hash of the state of an external registry that you depend on, but that’s effectively the same as just including the contents of that registry.

For private registries, of course, one will generally want to depend on public packages as well, so private registries will generally be incomplete. It may make sense to allow registries to indicate which external registries they depend on. Putting this information in the manifest makes very little sense to me. Why there? Suppose you get a registry that includes package A which depends on package B which is in another registry. What manifest do you look at to figure out where to find B? At this point you have no manifest to look in. The next place the proposal puts a registry field is in a project file for A. Ok, that’s a bit better since you can find a project file for A. Do you have to download a copy of A to get a project file? Do you have to download a project file for each version of A that you’re considering installing? Or just latest one? Or just the project file on the master branch of the git repo. Note, that much of the speedup of Pkg3 over the previous version came from avoiding having to git clone package that you install. This proposal seems to entail git cloning every package that you’re considering installing in order to be able to even decide which package and versions of them you actually want to install.

So there is a solution to that, of course, which is to put that information the registry so that you have all the information you need before you start trying to decide what to install. Does this mean that each package that depends on some other package needs to record not only that they depend on that package but also where that package lives? Is that recorded per version of A?

Note that each version of a package has its own corresponding version of its project file. So this design seems to imply that each version of each package keeps track of which registry it comes from. What happens when a package moves between registries? Do old versions still keep the old registry as their registry of origin? That seems broken since presumably when transferring a package between registries you would also move the records of all older versions of it. But maybe not: what if you want to open source a package starting at some version but keep all the older versions of it private? In such a situation I can imagine keeping some of the old versions registered in a private package that only internal developers can see, while having the rest of the versions in a public registry that everyone can see. Which registry does the package belong to then?

Bottom line: I think that the notion of letting each package version indicate what registry its dependencies come from is fundamentally broken. I can, however, imagine each registry indicating what other registries it needs in order to have a complete dependency graph. So you could install a one registry and automatically get some other ones too so that installing packages from that registry can be expected to work.

3 Likes

Note that the cross-registry dependency issue seems to me to be completely unrelated to the package namespace issue. I suppose if one accepts the premise that namespaces should correspond to registries and the premise that dependencies should be specified as registry+package then it kind of makes sense, but I disagree with both premises so I think that namespaces and cross-registry dependencies should be totally independent.

1 Like

I would counter-propose two relatively simple, independent potential features:

  1. Allow registries to depend on other registries. When you add a registry, all the registries is depends on are also added. That’s all.

  2. Allow narrowing down what package to install by some kind of domain modifier. We can allow tagging packages with a set of “channel” or “domain” identifiers and then you could narrow down the scope of packages that way.

I find the latter feature less compelling that the former, but if some notion of how domains or channels would work becomes clear, we might as well support picking out packages that way. I still think it might be fine to continue with the current workflow of doing pkg> add X and then prompting with a set of possible X packages to install, showing information about them including their domain and how highly ranked they are (which helps protect against name squatting attacks).

2 Likes

There is another thought that I’ve had that’s related to this, having to do with federation of responsibility to organizations. It might make sense to have some arrangement like the Linux kernel maintainer tree where responsibility for and policy regarding parts of General are delegated to organizations. The way I could imagine that working is that one registers new version in, e.g. the JuliaOpt org (just to pick a major one), and the review and CI happens within that org; new versions are then upstreamed automatically to General from the JuliaOpt registry. However, the complication is that there are many basic packages in General which JuliaOpt packages depend on, so the JuliaOpt registry won’t be standalone—it needs some part of General as well. I could imagine mirroring a subset of General into the JuliaOpt registry and then propagating new versions of JuliaOpt package back to General. But overall it seems very complicated and it’s unclear what problem this is fixing.

4 Likes

Hello from :philippines: :blush:

Thank you Stefan and thank you Julian for your thoughtful responses :raised_hands:

You both raised excellent points. I see some paths going forward, but this is a big effort with not a huge payoff right now and there are other more urgent things needing attention, so I’ll park this for now. Thank you again for your consideration :raised_hands:

3 Likes

Another possibility that came up on this week’s pkg-dev call was using user/org names to qualify packages instead of registry names or tags/labels. So, continuing the JuliaOpt example, if you wanted to install the JuliaOpt XYZ.jl package instead of someone else’s XYZ.jl package then you could write pkg> add @JuliaOpt/XYZ. The nice thing about this is that this information doesn’t need to be added—we already have it—and it’s well known since it’s common for people to know the user or org who maintains a package. Note that this would just be a way to indicate which UUID you’re talking about; if a fork of the same package exists with a different user/org name with the same UUID, it’s unclear what should happen. But minor questions like that aside, I quite like this idea.

Issue opened to discuss: https://github.com/JuliaLang/Pkg.jl/issues/1071.

3 Likes

Issue opened about allowing registries to depend on other registries as well: https://github.com/JuliaLang/Pkg.jl/issues/1072.