Reduce package registration waiting period

Great talk, yellow viewgraphs or not. Also very scary.

1 Like

For what it’s worth, I love small single-purpose packages (one of my favorites is UnPack.jl). I hate having to add huge (and usually deeply nested) dependencies to my packages just to get a few simple functions that don’t rely on the rest of their package.

2 Likes

Indeed, Upack.jl, Reexport.jl, Safetestsets.jl, etc. are good packages. Which is why I think this designation is much more difficult than “small”, and why it may be hard to make black and white, even though there’s a “I know it when I see it” component to it.

5 Likes

Let me just speak on behalf of the community here and add some context.

@anon37204545, sorry this has come up around one of your packages. You’re not the issue and I hope you do not feel like any of the comments in any of these threads is directed at you. What’s going on here is that the community has been transitioning towards a more free and decentralized framework for package registration. We used to have a system where packages were individually vetted by specific individuals before registering (more like R), but as the number of registrations have grown it became automatic. That then causes issues such as security, and @anon94023334 had a great JuliaCon talk about how these changes to the package ecosystem effect the utility of Julia packages in secure environments like national labs.

So we as a community have an interesting position where we need to define ourselves and our practices in the contexts of what’s best for our uses and applications, which can be different from the general ecosystem of NPM and PyPI. While your package has now brought these questions up, it’s not your fault: it’s ours. We do not have any rules or guidelines here, so of course they weren’t followed! This is a good time for us to really think about what we want as a community and change our guidelines so people know what to do in the future (these new guidelines are being hashed out in this PR).

32 Likes

I generally agree with Stefan’s point that right now 3 days waiting time does more good than harm, but for future reference here are some use cases when it annoyed me pretty much:

Publishing dependent packages

Usually I have whole chains of packages, say, A <- B <- C, where A is a pure library, B depends on it and C depends on both and is private. To test C for production, I need to ]add them as if they have already been registered and even include concrete versions of A and B and run tests, but for 2 new packages it means up to 6 days before I can be sure C is ready to be deployed.

Perhaps I could use LocalRegistry and deploy without any waiting, but then I would have very real motivation to actually make these packages public later.

Figuring out structure of dependent packages

Another problem when developing multiple interdependent packages at the same time is figuring out what code should go where. In my day-to-day job in industry, where I mostly write in Python and Scala, we restructure packages approximately every 3 months. Sometimes such packages live for just a few days: we create a new package, try out new structure, realize its limitations and refactor again.

In open source and with public packages it happens more rarely, but still happens. For example, I remember Hadoop ecosystem maturing, new packages being separated, merged with others and deprecated pretty quickly. Usually these are not user-facing packages - no, that ones stay stable - but some helper libraries covering specific needs.

Moving common functions to a separate package.

Once I had a set of utils that I used in 2 or 3 packages. These were very simple things without their own common domain, e.g. @get - a macro similar to get, but not evaluating second argument unless needed, or macro @runonce which avoided re-evaluation of some pieces of code, etc. I thought about creating a package with some dummy name like LittleGoodThings and moving all the utils there, but I couldn’t come up with a reasonable description and decided simply to duplicate the code.

Announcing a package

You know this feeling when you are ready to release first version of a package you’ve been working on for several months. Over the weekend you finish last tests, write docs and encouraging text of announcement to post it here, on Discourse. Then you submit a PR and… wait until Wednesday until it gets merged. Over this time the enthusiasm fades, you get back to daily job and come back to the package in the middle of Friday. Not a big price for something you’ve spent on several months, but it also doesn’t encourage for more active development.


None of these use cases were really blocked by the waiting time, but it made things pretty inconvenient, so hopefully in some distant future this last issue will also be overcome.

3 Likes

While I agree that this can be annoying, and had to deal with this myself before, this is also a good thing. What’s good about it is that makes you think twice about registering a bunch of deeply nested packages. The longer wait expectation will force people to carefully plan the nested dependencies, and maybe there is a different approach for interoperability you think of, which you might not do if you can lazily regestier tons of nested packages all at once without waiting.

7 Likes

Just write the announcement on Sunday and save it somewhere.

6 Likes

really interesting discussions and both sides have valid points. what i like with debian packaging ecosystem is that they have very specific guidelines in submitting a package and because it is highly specific, they have tools to check if your package passes those guidelines. it checks for name conflict, documentation, etc. it is like a linter that checks common mistakes. we like coding because we are basically lazy people. we let our imagination automate the repetitive tasks and automation is a test whether we have clear idea of the guidelines we want to execute.

maybe we need a lintian tool similar to debian:
https://github.com/Debian/lintian

The general registry also has very specific guidelines, and the CI bot tells you what needs to be remedied in case of a failure. The waiting period just allows constructive comments beyond these requirements, but these per se will not prevent merging once reviewed. There are quite a few packages where the author insisted on the original name and it got merged.

That said, the comparison with Debian (or any major distro) highlights an important difference: all packages in the distribution are implicitly guaranteed some form of maintenance. Security fixes at the minimum, and usually critical issues are also prioritized. If the maintainer cannot/does not do this, others step in in a timely manner. Eventually, if upstream is abandonned and this becomes impossible, the package is removed from Debian (or in some rare cases, forked and maintained a bit more). Because of this, uniform standard are critical for Debian.

In comparison, the General registry is closer to a structured database of package metadata: it takes releases as given, and makes no explicit quality requirements of registered packages.

2 Likes

by the way, i like the implicit standardization when you submit your package for publication under the tutelage of @matbesancon. with the experience i had, i’m pretty sure those packages accepted for publication have great documentation and usability based on the help of the reviewers. peer-reviewing of package submitted is great and maybe if we can flag those packages peer-reviewed and accepted for publication, it can indicate some form of quality.

regarding debian packaging, i don’t mean to take it literally. if there is a tooling that helps packagers suggest the common style of naming, check for similar names, suggest categories, suggest similar existing package, etc., the actual submission will have less issues because the offline tooling already traps those common issues.

the idea of an offline package lintian checker is that new developers don’t have to wait submitting their package to general registry and be flagged that your package name should be changed, you need more docs, docstrings, etc. if they can run a lintian offline while developing a package, they can already incorporate these expectations incrementally and by the time they submit the package, it will be straightforward and fixes will be minor because they already developed the habit of julian way from the start.

5 Likes

running lintian to your package under development and lintian reminding you of the lack of docstring in function blah or lacking some examples in function blah will help the new developer gets the habit of working with julia packages. it can also help on those existing package by running lintian everytime one makes a release to make sure those docs/examples are there and new function names are without conflict with base packages, etc. lintian can just serve as a guide to developers but at least it can provide some standards to the most common expectations without re-reading the guidelines because lintian can check it for you.

I look forward to using this package :slight_smile:

1 Like

i’m tempted to implement this in PERL :joy:. it will parse the evolving guidelines and also record the names of the functions in base, create an implementation for each guideline (by regular expression, guideline syntax parser, etc) and check the package for any violations.

it can also include summary statistics: number of functions with no docstring/examples, number of functions similar to Base, number of functions with more than 10 lines of code ;), no. of function names not in standard format, no of global variables, etc.

1 Like

Exhibit A.
Empty packages with the most common names:
https://crates.io/users/swmon

Discussion threads in Rust community:

Mitigation: Maintainers manual intervention:

3 Likes

A couple of points:

Pkg has no problem with duplicate names, so squatting a name like this does not prevent someone else form using it. It prompts for which you want and in the future we could provide download stats and/or stars to help people decide. I doubt an empty package will get a lot of stars or downloads so it would not take long to rank higher than a squatter.

We should actively archive unmaintained or empty packages. We have already done one round of this for packages that don’t support Julia 1.0 at all; more archivings can occur in the future.

13 Likes

Can / should this be automated?

1 Like

Yes.

5 Likes

The choice to identify packages by UUID was really a smart one. It’s impossible to squat the UUID address space :wink:

What does it look like for the user when two packages get registered with the same name? Suppose you somehow want both is that possible? How does one go about using them?

1 Like

I’m still waiting for someone to insist on a particular “vanity UUID.” It’s not currently possibly to directly use two different packages with the same name form a single project. A project and one of its dependencies can do so, however.

1 Like