Please more bureaucracy on package registrations!

BTW, is there a smiley for tongue-in-cheek?

Inspired by the The present and the future of package registration thread, especially by the response by @Tamas_Papp to my comment and by the reference to this PR Check that there are “enough” tests and documentation.

Yes, automated checking some metrics on documentation and testing coverage of a new package would be nice to have. But what we can also do is to ask the package authors to fill in themselves a questionnaire, answering the questions on these coverages, and many many other questions. The uses of it would be:

  • Should the authors state their package is deficient on these metrics, the decision on registration is to be made by a human.
  • The authors would be by the mere fact of answering the questions nudged to do this less-inspiring job of a better package support.
  • The questionnaire is also a kind of checklist, informing the authors about some options which otherwise might not probably come to their mind.
  • The information provided therein would be available on the JuliaHub and useful for potential package users.

The questions would be:

  • What kind of documentation is provided (Readme, docstrings, Documenter-generated)
  • Are all exported function/methods documented? Where? (IMO that should be the minimum standard requirement).
  • What is the test coverage?
  • How is CI implemented?
  • Usage of tools like Aqua, JET, you name it…
  • Do you know of similar/related packages? Which? Do you refer to these in the documentation? Should their authors be informed about your package?
  • The number of maintainers of the package? Are you principally interested in transferring the package to an organization?
  • Do you plan a long-term support for your package?
6 Likes

This nudge can be sort of useful. GitHub bug report issue templates are probably a good example of this. The reporter feels the “burden of proof” for documenting a bug. If such evidence is not really provided (no stacktrace, no MWE, etc), then understandably the issue falls down the priority list.

I don’t see how this can work with the current way that I interact with the registration bot though, but perhaps the creators see that differently.

1 Like

Note that the bottom line of my comment is that relying on the goodwill and decency of package authors basically works OK and abuse is really rare.

I disagree with your proposal — while it would conceivably raise the standards for registered packages, it would also mean that a lot of packages would go unregistered because in the initial development phase the authors prefer to focus on key architectural choices rather than dotting all the i’s and crossing the t’s, while the package is already usable for a lot of people.

Let’s face it: a lot of very useful Julia packages are technically in alpha or beta stages. Eg consider Enzyme, a very viable AD solution for reverse mode: its CI is failing at the moment, it has undocumented exported symbols (eg Enzyme.onehot), it uses Aqua but not JET, there is a ton of “similar” packages, etc.

I am not picking on Enzyme here (I think it is the best thing since sliced bread). A lot of other packages are like this in all fields. And this is the best we can do I guess, given the constraints on developer time.

I think that the purpose of the General registry is not to be a badge of quality of anything remotely like that. The best way forward is not to “fix” anything about General, but start a curated registry with higher QA requirements. It puzzles me why people interested in this issue are not doing that.

20 Likes

I think it would make sense to go the other way:

  • create a new Universe registry with very few requirements
  • set higher requirements for the General registry

Having an alternate (non-default but still official) registry for “anything goes, more or less” should add very little friction[1] and at the same time it would be a powerful signal regarding the quality we’re aiming for.

Yes it means asking some more effort from the developers to join the “big guys” in the main registry. I think that’s fair: it will benefit all the users as well as the ecosystem reputation. If the devs don’t have the time to satisfy the QA, they can and should register in the Universe instead.

Example benefit: I bet there are many packages where the biggest QA issue is undocumented exported symbols. Probably a large part would get documented quickly if that’s all that’s keeping them from the main registry.


  1. That could require some changes e.g. in Pkg to avoid naming conflicts or better support different packages with same name in different registries… ↩︎

What happens to packages already in General? Do they get kicked out if they do not satisfy the requirements?

I don’t think that a lot of people care, honestly. At this point the main motivation for being in a/the registry is giving fine-grained information to the package version resolver if the package has dependents. That’s all there is to it, it is not a quality badge or an honor.

My problem with this family of proposals is that they try to shift the hard work of what is essentially maintaining a curated list of quality Julia packages to the authors of packages. I don’t think this is viable.

I think it is better to admit that maintaining a curated list is a hard and mostly thankless job, then band together with a few like-minded individuals and start something small, eg a list for a certain field, like CRAN task views:

8 Likes

Where can you see this please? [sorry to derail the post]

Please note that my suggestion is NOT that all questionnaire items must be checked.

Actually the only thing I explicitly proposed to be necessary for auto-merging was that all exported entities to be documented, which is I think is not really too much to ask. Even that I would consider to be open for discussion. I am not even sure if requiring some test coverage should be made a prerequisite: Defining meaningful tests is not always straightforward.

The whole rest is, well, a checklist to inform/remind the package authors of what belongs to a well-supported package and to inform potential users about the package state. I do not think just filling in such a checklist would take much time, especially if some pre-filling could be done by a bot collecting information about the package.

1 Like
1 Like

What could also be done based on this information provided by the authors (and re-checked by a human in this case): Award a badge “Quality Package”, if it fulfill requirements like: version past 1.0, multiple maintainers, maybe downstream dependencies, (at least formal) long-term maintenance commitment, documentation, test coverage, …

3 Likes

For example they could be grandfathered in the “new” General, possibly with a disclaimer on JuliaHub, or they could be moved to Universe after 1 year.

There is significant convenience in having some standard registries. I make a new environment for each little project I work on. I would not like having to type the full URL each time I need CSV, DataFrames, etc…

Yep, and that’s my point: it would be great for Julia’s reputation if being in the default registry was a badge of quality.

On the contrary I think shifting the work to the package authors that care is more scalable (and the authors that don’t care much can register in Universe).

1 Like

What is the advantage of having a separate registry over just awarding qualified packages badges like “Quality Package” / “Curated Package”, combined with the ability to filter by those?

The idea is to set a baseline for what is considered normal or expected from a package.

For example documenting all exported symbols should be table stakes, rather than an outstanding achievement worthy of a badge reward.

My guess is that this kind of “hygiene baseline” would lead to an increase in the average quality of packages (along these QA dimensions of course) and also improve Julia’s reputation.

Probably badges would also work to some extent, especially for improving the average package quality (for the language reputation I’m not so sure, I think a baseline for the whole of the main registry would have a much stronger effect than having badges on some packages).

Also badges would be less controversial, easier to implement, and not restricted to a binary classification…

So that it is not breaking, I think you would need to keep General as is and create a Core Registry with the high standards you’re suggesting. Then, Pkg would need to be able to see both and potentially throw some kind of warning when installing a non-Core package.

Edit: maybe you could set a preference in Pkg to only use Core.

1 Like

I don´t think these proposals change anything about package perception. Julia packages (or any other packages) are not found by looking at a list of what is available in the general registry. They are found by googling. And what one reaches when the package is googled is the package github page, or its docs. If those are of good quality that is from start clear to the user. I don’t think that preventing or allowing the package to be installed by add Package or add https://.../Package.jl change anything on the user perception. And, importantly, if packages (perhaps good ones) don’t get registered in General by any friction imposed to the developers, what can happen in the long term is an overall less reproducible package management experience, with a deteriorated experience for everyone.

Positive incentives, like quality badges (aqua, jet, some level of docs quality) stimulate (and guide) developers towards good practices, much more than added bureaucracy.

6 Likes

One counter-point to this is licenses. General has always required an open-source license, but didn’t start enforcing that in AutoMerge until March 2021. That check triggers on new versions as well as new packages, so existing packages without an open-source license couldn’t register a new version until they added a compatible license. That is a pretty strict standard, but we thought it was worth it because it is enforcing an existing hard requirement. And since then the number of packages whose latest version doesn’t have a machine detectable open-source license is lower now than it was then, despite the overall number of packages increasing by a lot.

So I think adding registration requirements can be effective (and can also guide folks towards improving practices if the errors are helpful enough), but I agree that it can be too strict.

Something I’ve wanted to do for awhile is add a little automated message to registration giving some stats about the package (or new version of the package), number of lines of tests/docs/source code, presence of CI scripts, etc, as a little “package health checklist” where boxes would show up checked if you had CI, etc, and could have links to resources (how to setup CI, what tests are, etc). And it wouldn’t block anything, just inform. I think that could be useful and help guide folks towards best practices.

10 Likes

JuliaHub package search is quite a useful tool, too, and it provides a lot of useful information at a glance. It is also a place, where some additional information collected from the authors (number of maintainers, commitment to long term support) could be displayed.

1 Like

That is quite similar to what I am suggesting - just I’d also ask authors to add some additional information.

As for the blocking from auto merging: IMO just asking the authors to minimally document the API (or explain to a human why that should be a problem) is not asking too much. But if there is the consensus that it is - so let it be that way.

1 Like

Just in case, an Issue in Aqua, with reference to code: Test if all exported names has a docstring

And perhaps a new ]info PkgName or info PkgName --verbose to get metadata… stuff as package lines of code, CI coverage, doc url, last updated of main/release, number of contributors, licence, tags…

3 Likes

I think you might overestimate the GitHub savviness of the beginning Julia user. When one is not well-versed in open source software, it is not always a reflex to check / evaluate things like

  • the quality of the docs
  • the presence of CI badges on the README
  • the date of the latest commit
  • the number of stars
  • the number of contributors
  • whether it belongs to an organization

At least for me it was a learning curve

2 Likes