The present and the future of package registration

There are 300 people in the channel, that would have to be a hell of a vacation! It seems to me that a check for “this package name might be a bit too generic” is hard to automate, so getting every registration in front of loads of people’s eyeballs seems to me the best feasible solution.

2 Likes

As with most channels, the participation rate is far, far lower. I suspect most are either inactive on Slack or have it muted.

I didn’t mean to imply that all of those are active, but in my experience every package for which I think “oh, that seems potentially problematic” already has a comment when I click through to GitHub. Hard to say how many people click through on the links, but I’d be surprised if it was less than 20 (which should give us holiday cover!)

1 Like

Yeah, I don’t know how to solve the problem. I guess the options are:

  1. We just organically have enough people keeping an eye on the new-packages-feed so that things don’t slip through. I, personally, like to check on it, and I’ll definitely leave comments, but I have no idea how many other people really keep up with the feed
  2. We just live with inconsistent enforcement. Not ideal, but it’s been working in the past, and I don’t know how much bigger the problem will get in the future
  3. There’s a (paid?) package community manager whose job it is to keep an eye on the feed, flag package names that might need some discussion, and generally helps people with the package registration process if needed. Probably the ideal solution, but who’s going to hire that person :slight_smile:
  4. We disable automatic merging of new packages. This guarantees that nothing will slip through the crack, but I don’t think it’s sustainable (except maybe with the package community manager). I don’t think anyone really wants this level of overhead.

Actually, I think it’s slightly problematic that the registry maintainers seem to only be reachable via Slack (which is why I recently made the pointer to #pgk-registration in the Registry CI more explicit). There have been instances where people trying to register a new package (somewhat justifiably) complained that they had to jump through multiple communication channels to finally get their package merged. It would be much nicer if there was a group of people monitoring new registrations to General on Github. If someone comments on their PR on why they like to stick to their name despite the name similarity, someone should react to that without further prompting, ideally.

4 Likes

Good point about GitHub. Having some sort of automation that mentions a GH team and/or adds relevant labels when a package is flagged for discussion (maybe with variation for appealing the automerge decision vs identifying packages that need more scrutiny) would be great too.

I recall a different thread on this topic, which escapes me now unfortunately since I’ve made this point before. But even if packages could be added by org, it still seems worthwhile to disallow packages with the same name. A perfect example is to go to JuliaHub.com and search “unitful”, there are many “UnifulXX” packages (Unitful, UnitfulAstro, UnitfulUS, UnitfulMoles, etc.). Now imagine that doing that search but where every single one is Unitful.jl, some in big orgs (JuliaAstro/Untiful.jl) and some not (gituser12345/Unitful.jl). That seems way worse for discoverability to me than the current status quo.

It also would be reasonable to expect someone to need more than one units package anyway, and I don’t really like the idea of doing

using JuliaAstro/Unitful as UnitfulAstro
using PainterQubits/Unitful as UnitfulUS

Because it doesn’t buy me anything.

So, adding packages with an org/user could be helpful, but it doesn’t strictly solve everything, and I would imagine a lot of Discourse posts beginning with which version of Untiful should I use?

(I’m not picking on Unitful-packages here, I love them all and that they’re all named differently!)

(edit: @lmiq I’m also not saying that’s what your point was, yours was just the first post to mention the idea of adding orgs to packageds names so I hit reply.)

5 Likes

“Which version should I use?” comes up either way. Either “orgX/Unitful”, “orgY/Unitful” – or “UnitfulX”, “UnitfulY”, etc. If anything, belonging to a “well known” organisation can provide a degree of authority straight in the name.

Maybe I’m the ultra-liberal in the room, but for the life of me I can’t understand why there should be limits on how I name my package, or how I document, or test my package? As long as I’m not doing anything offensive, I should be allowed to register it - and the Julia users can decide if they want to use it or not.

The very basic argument is that registration checks are practically useless:

  • the entire codebase can be changed after the registration
  • useless tests can be easily added

In addition, it can lead to subjective decisions. As in the case of “generic” names - when is a package generic enough to be allowed a generic name?

Finally, naming a package is also a branding decision, and frankly, these descriptive name guidelines are boring as hell.


Edit 1: please don’t take the discussion in the direction of the usefulness of tests and documentation, I fully agree on their value.

You’ve been told the reasons a few times now - maybe you don’t agree with them, but surely you should understand why they exist.

And you seem to conflate a little extra friction with being prevented altogether.

Something doesn’t need to be foolproof or ironclad to be useful, as well.

For better or worse, General is a loosely curated registry, and the Julia community appreciates that it is what it is.

Maybe there should be a free-for-all registry, as well, and/or namespaces, and/or better ecosystem to support multiple registries. I think it’d be useful to have that. But General being lightly curated is a benefit.

7 Likes

There are soon 10000 registered packages. Every package added increases the entropy of the whole system (yes, sure I know, in many cases the package is also useful for something and somebody).

Certainly there won´t be a one-size-fits-all solution. But this example for me provides exactly the contrary argument: You would only use as .... if you wanted to qualify the names of the functions of each package, which mean that they are not meant to interoperate. Which is not the case, at all, here. The package are actually meant to work together as extensions of one another, and having the same name under different “physics-field” organizations would be quite indicative of that.

Anyway, I certainly would prefer this from a possible proliferation of custom registries, which would have these and many other downsides.

What I like most about “under-organization” packages is the possibility of the organizations to give more clear indications of the interoperability of its packages, taking somewhat the place of the “mega-packages” that exist in other languages but are unnatural to Julia. Yes, all that can happen already, but would/could be more explicit with a tree-based package organization.

1 Like

There aren’t limits on packages development. There are limits on what you can register in General, because General sustains the Julia community as a whole.

This is going off on a tangent, but what I really like to see is support in Project.toml to point to an arbitrary LocalRegistry. As in, “Get this package from this registry”. That would make registries for specific sub-communities or organizations with their own guidelines much more feasible. I’m not sure how explicit that is in the long-standing Store some location info for deps in Project.toml · Issue #492 · JuliaLang/Pkg.jl · GitHub, so maybe I should open another issue for that specific idea.

3 Likes

An what if that local registry disappears? This can lead to a new rabbit whole of irreproducible environments.

Tough luck for users of that package, then, I suppose. I don’t see that as something that’s likely to happen. Registries are Github repos, so as long as GitHub is around, someone would have to go out of their way to delete a registry. Plus, people can clone/fork them!

People have unregistered packages, which they can also delete. That’s why General will continue to be the default: it comes with stronger long-term guarantees (which also includes stricter rules for what can make it into General)

I’d say if you’re a developer who chooses to put your package in some registry other than General, it’s your responsibility to make sure that registry stays available.

1 Like

Yes, ok, agreed. That’s a nice feature anyway to have in the Project.toml.

What I would not like to see is a tradeoff between distributing in General and distributing in custom registries that ended stimulating the proliferation of custom registries. That would be bad.

1 Like

The packages themselves don’t have the guarantees that General itself has, though - the packages could disappear just as well as a custom registry referencing them does.

Registries also aren’t stored in Project.toml or Manifest.toml right now - they’re just part of the environment. So that would need to change.

But let’s say it did - perhaps Manifest.toml just needs to store the last-known repository URL from the registry, so that it can fall back to the specific package url if the custom registry were to disappear or wasn’t explicitly loaded in the environment.

The packages themselves don’t have the guarantees that General itself has, though - the packages could disappear just as well as a custom registry referencing them does.

Good point! Although still not something I would worry about. If you really need to guarantee reproducibility for regulatory reasons, you probably should archive a src-snapshot of all your dependencies.

Registries also aren’t stored in Project.toml or Manifest.toml right now - they’re just part of the environment. So that would need to change.

That’s exactly what I was proposing

But let’s say it did - perhaps Manifest.toml just needs to store the last-known repository URL from the registry, so that it can fall back to the specific package url if the custom registry were to disappear or wasn’t explicitly loaded in the environment.

I agree! That sounds like an excellent feature, but it should be in addition to support for registries in Project.toml. I mostly treat Manifest.toml as transient, or for reproducibility if the exact versions matter. I’ll probably check in the final Manifest.toml for a research project with a bunch of notebooks when the project ends, but not while it’s active. The Project.toml should have all the information required to instantiate a project, even if it uses unregistered packages or packages in registries other than General.

3 Likes

If the package repository is deleted that won’t exist anymore.

I mean, the way Julia treats packaging and reproducibility is somewhat dependent on the General registry. And it is one of Julia great features. Let’s not weaken it. With 10k+ packages already registered the exact list of packages is not important anymore. The integration of the packages with the dependency manager is much more important.

I agree completely!

Julia treats packaging and reproducibility is somewhat dependent on the General registry

Very true, but something I’m slightly ambivalent about. I definitely agree, we should have as many packages as possible in General once they pass a minimum threshold of quality, and subject to some light-handed community guidelines. The benefits of General especially for reproducibility are huge, as you point out. The rule that packages in General can only depend on other packages in General should stay in effect, which already provides a huge network effect to ensure that General stays the default.

At the same time, I’d also want to slightly ease up on how tightly Pkg and General a coupled. There are huge pain points with working with completely unregistered packages, to the point where I would say any package that any other package depends on (including just for its tests) has to be registered somewhere.

That means a LocalRegistry is the only viable option for anyone who doesn’t want to register their package in General for any reasons like the following:

  • The package just isn’t mature enough yet (but is a dependency for other unregistered packages)
  • It’s a one-off that won’t be maintained going forward (outgoing PhD student packaging up their code)
  • They don’t really care about integrating with the community, e.g. it’s a company or research group that just wants to do their own thing, which somehow feels like it should be their prerogative, see @essenciary’s “liberal” view point.
  • They disagree with the community standards like package naming (just another form of “don’t care about integrating with the community”, with all the drawbacks that entails). Again something I’d like to discourage, but which should ultimately be their prerogative.

LocalRegistry works pretty well, of course, but it’s still completely external to Pkg: You have to put instructions to run pkg"registry add <repository url>" manually in your README/company wiki, and more importantly, it complicates setting up tests/CI/Makefiles quite a bit. Again, I just think Project.toml should be completely self-contained and account for non-registered packages and non-standard registries. This might actually take off some pressure from General, as it might cut down on “pre-mature” registrations. That is, registrations happening primarily because it’s such a pain to work with non-registered packages (especially for newish users who haven’t gone deep enough down the rabbit whole to even know about LocalRegistry).

Anyway, I think we actually all agree that these might be nice features to have, as well as that we’d like to encourage a strong and vibrant community centered around the General registry, so the rest is just details :slight_smile:

4 Likes

Yes, I had to hunt around on discourse to find how to add a registry in github CI. It’s doable, but not really easy to find. This is from someone who mostly copies and edits CI yaml files and hopes they work.

4 Likes