The present and the future of package registration

This has been on my mind for some time, but now that my package registration PR has been closed, I finally decided to take a few minutes to share.

TLDR;
Due to the fact that Julia packages are registered at “top level”, there is scarcity in regards to package names. This introduces unnecessary limits and friction.

What do I mean by “top level”? Simply that the packages are not namespaced under an organisation. Like for example how NPM allows: Creating and publishing an organization scoped package | npm Docs

What do I mean by unnecessary limits and friction? Take my PR for example: New package: CodeAI v1.1.1 by JuliaRegistrator · Pull Request #82492 · JuliaRegistries/General · GitHub

a) this has triggered a moderator to follow up with suggestions that come no doubt from good intentions, but that are completely subjective. What criteria would allow me to use that name? 2 models? 3? I don’t think this is a moderating decision.

b) the moderator has not followed up on the PR which was left without an active resolution. I get it, we’re all busy - but the system is set up to default against the users. If there’s no bandwidth to follow up on replies, don’t raise issues.

c) the name similarity algorithm is way too basic to be useful:


Any human can easily figure out that “CodeAI” and “Codecs” are quite different.

So

  1. are there any plans to introduce organisation level packages? These would allow removing such artificial restrictions.

  2. I would like to get my package registered please. Frankly I’m losing motivation to work on this as I have to deal with administrative red tape.

  3. it would be great if, as a principle, moderators could refrain from subjective decisions that fall outside the naming recommendations. As far as I can tell my package name was not breaking any of the guidelines.

PS:
This is not a comment at the address of the moderator. I appreciate the work and the support and I know they mean well.

6 Likes

I didn’t know there were moderators on the registry … I’ve commented on PRs as well - I think any user can🤔

5 Likes

Yes, agreed with cormullion, Jerry is not some sort of “moderator” just - like many of us - someone who monitors the new packages feed on Slack and comments if they think they have a helpful suggestion or a concern about a specific registration (e.g. every now and then we get people that register personal packages due to a misunderstanding about what the General registry is).

In your case, Jerry’s comment was a suggestion which included [noblock] to make explicit that they were not trying to prevent the merge to happen. The PR was closed just because it was inactive for 30 days, which in all likelihood just meant those with merge rights forgot to look at it again. You’re always free to ping on GitHub or in the package registration Slack to prevent this from happening.

FWIW if I had seen this PR to General I might have added a comment myself - the name CodeAI is not very descriptive to me, and from the one-line readme I don’t get much of a sense of what this package is or does.

So tl;dr, there aren’t “moderators” of the General registry, just people with merge rights. As such, there is no “subjective decision that falls outside the naming recommendations” here, and in my experience every discussion about package names for General I have seen in which the author insisted on a specific name got merged.

19 Likes

It’s very clear that Julia has outgrown the current system for registering packages. And it will continue to get worse. But there’s the usual problem when someone points out a problem on discourse. Someone needs to find the time to design and propose a solution (not to mention eventually implement it.)

EDIT: I see people are going through the details of the complaint. Maybe those are valid points. But, the problem remains. Julia has outgrown the flat package namespace. The semi-automated registration, and name filtering process is breaking down and will continue to get worse.

I think we should focus on the bigger issue, not the details of this package.

8 Likes

Note that the moderator even added [noblock] to his comment, so his comment would not even have blocked you from finishing said PR, but you took no actions to address the name similarity (ok also no moderator did check). So with no-one following up, this was closed.

I think sub-namespaces are not that good of an idea. You can, however, always do your own registry and add it to your Julia installations. I think that is even more flexible than namespaces.

Besides that I agree with Nils, the name is a bit too generic maybe.

PS: Please do not loose motivation. I am sure we can find a nice name.

2 Likes

I’m not sure I agree with this. I haven’t done a systematic study, but among packages I’ve created, I suspect the average package name is around 13 characters. If you say that each of these “nixes” names with any 2 substitutions, that leaves (if I’m not being overly simplistic here, which I may be…) 26^11 ≈ 3.7e15 names available. Granted, most of those don’t roll off the tongue, but my experience when coming up with names for a package is almost never “drat, that’s already taken” but “how do I encapsulate what this package does in a good name?” That tends to push me towards longer names.

6 Likes

Oh goodness, I’m so sorry this has festered here. A few things from my viewpoint as someone who hasn’t really participated in the General registry but has seen it in process for a long time:

  • The scarcity of names is balancing tradeoffs between users (where it’s a big benefit) and package developers (where it can be a speed bump) — it’s a first line of defense against issues like typo-squatting, helps guide discoverability, and has sometimes even ended up with the package authors themselves being grateful for a better name that was found. There have indeed been discussions around adding namespacing (Pkg.jl#1836, Pkg.jl#1071), but I think the root here is really around process.

  • On the process, you got snagged by a simplistic automatic check. Perhaps this needs to be more clear, but it was this automated process that blocked your registration, not the subsequent human comment. The automated message is trying to tell you that the ball is in your court if you don’t want to change the name:

    If you do not want to fix the AutoMerge issues, please post a comment explaining why you would like this pull request to be manually merged. Then, send a message to the #pkg-registration channel in the Julia Slack to ask for help. Include a link to this pull request.

    Note that sometimes packages that pass the simplistic automated checks still get blocked by humans, and packages that don’t pass still get approved by humans. And yes, it is subjective. It’s a community trying to decide on a hard task: what should a name mean? Either way, you need to advocate for the package — make a post on that PR about why you think the name works.

25 Likes

The topic of names and AutoMerge came up recently on #pkg-registration on Slack (in a slightly different context), where it was discussed that the current registration system is not as documented as it could be. Here is a summary, copied (and lightly modified) from there, of how the current system works:

  • if the PR meets AutoMerge conditions, and no one leaves a blocking comment, it gets merged. (Blocking comment: comment from any github user that does not include [noblock])

    • Anyone can help contribute to the registry by providing comments on PRs, possibly blocking them if they have a concern that they think the author should address. This and other ways to contribute are discussed in more detail in the Contribution Guidelines.
  • if the PR doesn’t meet AutoMerge conditions, OR someone leaves a blocking comment, it gets merged only if one of the handful of people with write access sees and agrees to merge it.

    • This is generally facilitated by responding and taking action according to the feedback, or providing a good reason not to (kind of like replying to peer review in a journal submission), and then asking in the #pkg-registration Slack channel
    • If the feedback is adequately addressed according to anyone w/ write access, they can merge it
    • If no one w/ write access feels it was addressed enough that they want to be the one to merge it, it doesn’t get merged.
    • After 30 days, stale PRs are autoclosed by a CI job.
  • And, how does one join this group of folks w/ write access? If someone w/ admin access to the repo (much smaller group than write access) decides to give someone write access, then they have it and are a member of this group

So to apply this to this particular example, there wasn’t a blocking comment, but indeed AutoMerge’s conditions were not met. Nobody with write access merged it, there wasn’t any followup, and after 30 days it was closed.


It’s also worth understanding the current system does add some friction by design (although not as much as some other registries like R’s CRAN). Since General does have a flat namespace, it is a finite resource, and AutoMerge is designed to protect it to some extent, by trying to make it easy to merge maximally uncontroversial packages, and require human intervention in all other cases. Since General is run by volunteers, this human time can be scarce and it is up to the package author to request it. But there is also an easy way around it, which is to just choose a new name and resubmit.

This system does not make it easy to always get one’s ideal name of choice, but it does keep things running with relatively little amount of human power, while making some effort to keep the namespace unambiguous and following the naming guidelines.

13 Likes

Private registries are great for organization-internal packages, but the overhead is too high for that to be a good public solution.

A registry of registries could help on the user side, along with being able to do a one-liner like using RegistryA/PackageB to pull a package from a new registry.

6 Likes

Thanks everybody for the feedback and the clarifications.

@mbauman OK, now I’m quite surprised that anybody can comment and block package registrations. In the given context where the auto merge was blocked, my expectation was that a moderator (an admin that can manually merge) has stepped in for clarifications (as was in fact the case in the past). I honestly find it concerning that anybody can just pop-in and make demands, without a clear distinction on whether or not the person is an admin that is actually helping towards merging the PR.

@tim.holy even if there are lots of long names, there is also the issue of packages that do the same thing. Example, I wanted to register an OpenAI.jl package - but that was already taken. How am I supposed to name an OpenAI wrapper package?! If we allow packages namespaced under orgs, we can have packages with descriptive names - and various orgs can compete to provide good packages that are easy to find. In addition, this allows new packages to use good names that can be taken by old packages that are no longer maintained.

2 Likes

Can’t one just add a prefix to the package names ? e.g. in bioinformatics a lot of packages start with “Bio” (BioSequences, BioStructures, BioSymbols, BioServices, …). Same with Image* and Geo*. It’s maybe a bit low-tech but it works already.

3 Likes

I think that is a good thing, for example when the package is in ones area of expertise and a better name can be suggested. Just trust good Julians that they do it for the good (and often still [noblock] their comments in my experience)

OpenAI.jl actually is an openAI wrapper package. In this specific case I would recommend to join forces with the existing team, if they do not yet cover all features you want to see in such an API. If they cover all features you need – just be happy and use their code.
If we would have 17 packages of OpenAI wrappers it is much more likely that packages get abandoned, than if 12 of them join forces and the other 7 use that package. For old packages that seem abandoned, you can usually ask for help here, but I indeed only know of one case where new package superseded an old one and took its name (Graphs.jl that is). A more common thing is to put packages into organisations to avoid abandoned packages in the first place.

6 Likes

The extensive documentation encourages us all to get involved…

You (yes, you!) can help General be the best registry it can be.

If an AutoMerge guideline fails and the package author does not seem to know how to address it, you can help guide them through the process.

If a package fails the name similarity check, you can help out by taking a look at the two names as well as the package code itself, and try to make a determination if it looks “too close” (e.g. Websockets vs WebSocket), and if the package code contains anything that would indicate malicious activity. You can make a comment in the PR indicating whether or not you think the name similarity is okay.

People are usually just trying to help…

9 Likes

I see a few ideas here that I think are worth challenging:

1/ the package creators should do things: should put in effort to please a naive similarity check, should advocate for the package, should work on other packages, etc. That is not how we encourage people to contribute.

As an example, I have been successfully using my package in production for months and I don’t need to have it registered. I simply wanted to open source it and maintain it as a gift to the Julia users. If this is hard for me to do, no problem, I won’t do it.

2/ by the current naming best practices, packages like Genie or Makie or many others would have to be named “WebFramework.jl” and “PlotsPackage.jl”.

3/ in the end, my auto-merge was rejected due to a naive similarity check algorithm. However, the onus is on me (and other package developers) to solve the problem. This is not the right attitude - go to point 1 for more on this.

2 Likes

Typosquatting false positives could be reduced by noting that the author of the package is the author of another popular package.

I also like the idea of packages named under organizations. I think that would help discovetability, and possibly give credibility to packages under reputed organizations.

Someone using something like

using SciML/DifferentialEquations # as ...

is more likely to search for other packages at the SciML page, and more likely to end up combining packages that were actually meant to work together.

I don’t see downsides on this.

6 Likes

While I can understand how onerous that can feel, I can’t think of a better solution. As people have mentioned, the auto merge rules are attempting to balance between various trade-offs, including developer time, the time of volunteers maintaining the repo, what’s best for the community in terms of registry security etc.

You might argue that the way that the balance of tradeoffs is apportioned incorrectly, and if you feel strongly enough about it, it could be worth making the case in an issue. But these decisions weren’t made randomly or capriciously.

This is of course a risk for rules that add some friction - we might lose out on some packages that might be great, but I personally think the risk of no friction at all is higher.

Well, both of these started before Julia 1.0 and the general registry, and I don’t remember if these guidelines were in place for METADATA, so it’s tough to know how they would have been treated, but I’ve seen only light pushback on names like this. In any case, these are just guidelines, clearly, and commonly overridden if you follow the request of the bot, see eg

This is your opinion and you’ve every right to it. I personally don’t think sending a message on slack to get the attention of someone that can merge is all that onerous :person_shrugging:

9 Likes

This is a bit more onerous if you don’t have a slack account…

10 Likes

I do that fairly often(change package urls) on general registry and there is always someone who helps out :smile: on slack(I didn’t know about #pkg-registration :no_mouth:, still @ericphanson and @fredrikekre have helped in #helpdesk). Hoping this current issue of specific CodeAI gets resolved soon though.

1 Like

I think you are right, that makes topics / organisations better “discoverable”. I feel the slight danger might be that you get a few packages of the same name that do the same, but the developers do not decide to join their efforts.
But maybe that happens now as well just with slightly different names instead of the same, I am not sure.