The present and the future of package registration

True, that has another level of complication, if the user names clash. Package committers would rather have to have a “General registry username”, which could be identical to the github username, organization, or not (this is why having at least a feeling that someone likes the idea is necessary to start digging into the details - complications always exist).

1 Like

This isn’t at all specific to your issues, but I think there is desire but what is needed is action.

I think a problem is that lots of people need to use the registry, but only for a very brief period of time that is not core to their work (i.e. they just want to register a package), and that kind of interaction isn’t very conducive to folks moving from user to developer, which happens more often for other open source resources (like packages). So there are very few folks who contribute PRs to RegistryCI, update documentation in General, etc.

9 Likes

One related idea is Run AutoMerge checks in package · Issue #453 · JuliaRegistries/RegistryCI.jl · GitHub. This is open for grabs for anyone who wants to contribute.

2 Likes

I’m not a maintainer, but I would definitely leave a friendly blocking comment along the lines of “This doesn’t seem quite ready for the General registry yet”. The official guidelines explicitly mention both documentation and tests and prohibit placeholder packages. I’m not sure how easy that would be to check automatically. There has been some talk about adding checks for minimum lines of code and minimum lines in the README or documentation. I would certainly support those, but nobody has gotten around to implementing them. In the meantime, the “human” approach of the community checking what pops up in the new-packages feed seems perfectly fine. That’s exactly why we have the tree day review period for new packages!

I would certainly have raised an eyebrow at SciMLBase with an empty README, but probably only left a non-blocking comment, since I know that Chris / SciML isn’t exactly a random person who’s not going to follow up with documentation.

1 Like

I think ideally it would be something runnable even before one creates an actual package repo. Maybe something PkgTemplates could help with?

1 Like

In any case, there are often packages that are not meant to be used directly by users, and a higher-level interface may be provided by a second package. In such cases, perhaps it’s imposing a rather high standard to expect the “Core” package to be well-documented, especially at the time of merging.

A paragraph in the README “This package provides core functionality for link to other packages” would be sufficient then. I do that myself, exactly.

2 Likes

Care to elaborate?

Yes, there should be at least some guaranties for the package quality, and in the most cases nobody but you, the package creator, can do the chore of some basic checks.

IMO the registration policy is rather too liberal as it allows (?) to register packages with zero tests and zero documentation.

8 Likes

I cannot disagree with this, but note that it is tricky to design automated checks for these that cannot be easily circumvented.

Either we just rely on simple automated checks, and implicitly the decency of package authors, or we need people to review package registrations.

What I find nice is that even with the very liberal registration policy, abuse of the registry is pretty minimal. This is heartening.

7 Likes

As I understand the situation, the main benefits of the general registry are 1) to “officialize” packages, and 2) to keep them available in the long term. So it increases Julia ecosystem’s robustness.

Having GitHub repos as first-class citizen has some advantages (in particular, to loosen naming requirements), but at the expense of reproducibility. So even if it were a feature, I wouldn’t recommend it for the ecosystem’s main packages (or even for any registered packages).

As a side note, the “first come, first served” phenomenon is so common that I almost take that as a fact of life (and going against is a source of confusion). To illustrate, I have a bad memory of trying desperately to find the right Proj library in a Golang project.

3 Likes

One thing that came up on the Slack thread is that consistency of moderation is also (if not more) important than strength of moderation. e.g. if you have a poorly named package but it passes the similarity check by a hair, you’re home free. I’m not sure precisely how the current automated checks could be improved, but I imagine the Julia community should have no shortage of people who could provide knowledge and expertise in this area.

1 Like

human here, the answer given in the PR (“I’d like to add support for other models in the future”) was sufficient since I just wanted to nudge (hopefully evident by the [noblock]) to see if we can avoid unintentional name squatting.

I apologize if I failed to convey the opinion nature of me essentially saying “I THINK everyone could benefit from a less generic name but I’m fine if you think this is fine”


some historic context. we (was it @ericphanson ?) added the new pkg bot and channel in slack precisely so that everyone in the community can contribute eye ball hours to guard against name squatting. (I have posted the Rust lessons learned many times in Julia community) , and it’s also an excellent way for people to discover what’s happening out there. (again, this is not an instance of name squatting, otherwise, someone would have commented without [noblock].)

9 Likes

I was definitely involved, but I don’t remember if it was my idea or not. I think some of the motivation was additionally to have more visibility into new package registrations to be able help folks understand AutoMerge issues, since it seemed like there were a lot of PRs getting stalled with package author’s not really understanding what AutoMerge wants them to do (e.g. fix compat etc), since often new packages are also new authors (whereas new versions generally come from folks who have registered before).

4 Likes

Not completely sure if the argument was covered here already, but I sometimes think that the guidelines should discourage names that are too “big” sounding if it’s subjectively hard to deliver on what the name promises. Let’s say you make a package with machine learning utilities, obviously MachineLearning.jl would be too big. I’m not saying too imprecise per se, just covering too large an area. Or if you dabble in making tools available to connect to AWS, AWS.jl would also be too big unless you plan on really supporting everything and having your package be the go-to thing.

You could argue that you cannot know at registration whether you’ll follow through or not, so I’d say yes, stay away from names that create expectations in users you cannot or don’t intend to fulfill. That’s actually why I often like names like Makie or Genie, because they are really just names, without many connotations. Compared to, e.g., Plots.jl which I think, as has been discussed elsewhere, was too big because now the expectation from new users is that it’s the go-to plotting thing in Julia (historically yes, but nobody decided that it should be). For Tables.jl, on the other hand, it seems accurate as it’s a very generic and widespread tool. It’s all very subjective but worth the discussion when controversial names are picked, I think.

10 Likes

Then it would be a nightmare to load packages. Currently, any of my scripts involves at least importing 10+ packages, 30+ sometimes. With all those prefixes, I can not see a line of my actual code in the first page perhaps.

This is a very good argument for naming packages. It might, however, also depend on what your aim is. Optimization.jl aims to unify as many as possible optimization packages “under one roof”/in one framework. There I think the name is correct. If with MachineLearning.jl you aim for the same generality – sure it should have that name.
If you don’t aim that high and it’s more like KellertuersPersonalOptim.jl then maybe fine a name that describes that best (this example. surely does not belong to the good examples). Probably something that describes your own domain best then.

For names that do not have a direct connotation – I feel that might sometimes be a nice “way out” for your specific area in a field or if (like DrWatson.jl comes to mind) it is maybe hard to describe in one word or is too clumsy in a single name (maybe Franklin.jl is an example here as well).

1 Like

If with MachineLearning.jl you aim for the same generality

Yes I agree, I just believe the number of people actually following through on something of that size is small :slight_smile: So it’s better to err on the side of longer-less-official-sounding name. Like MachineLearningUtils.jl etc. or whatever.

2 Likes

You are absolutely right. The task would be quite a huge one to take.

I think very general package names like MachineLearning can be great, but I would put quite a lot of scrutiny on them at registration time:

  • Does the package already have extensive functionality, complete documentation, and decent test coverage?
  • Is there more than one person involved in its development?
  • Is it owned a Github organization, rather than a personal account?
  • Do the maintainers of the package have a track record?
  • Is it backed by some entity with permanent or long-term funding? (A company or academic research group with a tenured PI who can pledge continued support)
  • Is someone being paid (at least indirectly) to maintain this package?

Obviously, these are not criteria that can be checked automatically at all.

I would also extend this to things like 3-letter acronym packages. In general, the more general/shorter a name, the more important its quality and long-term prospects. On the other hand, a random PhD student who’s just packaging up their research code, which will probably become unmaintained after they leave academia, should err on the side of long, descriptive package names.

6 Likes

I agree examples like that deserve scrutiny, but note:

julia> AutoMerge.meets_distance_check("MachineLearning", all_pkg_names)
(true, "")

So if someone did register a package with that name, it would slip through AutoMerge unless someone else caught it on the #new-packages-feed on Slack and managed to take action. Although said Slack channel has caught less-than-optimal names in the past, it’s far from a well-staffed, scalable operation. Say the usual visitors go on vacation at the same time—now we’re back to a problem of inconsistent enforcement.

1 Like