Pruning and quality control for the package ecosystem

Related to previous discussion on starting new projects in place of abandonned ones (also see this one about QA), I found this short blog post where Eric Raymond discusses similar issues about Rust interesting. Some quotes:

when the question is “where do I get feature X?” the answer “oh, there are 23 crates for that” is objectively terrifying […]

As a potential Rust user, what I want to do is be able to go from a given feature requirement to one high-quality implementation with an implicit long-term stability guarantee. This, folks, is what a high-quality production language looks like, and not by accident – it minimizes discovery costs for actual production users.

12 Likes

I think this point is essential. Julia has the potential to deliver a cohesive ecosystem with low redundancy - both because starting a new ecosystem is a chance to get it right from the beginning, but also because of the central position on user-defined types. One way to achieve this is to focus on interfaces (such as DataStreams), another is the github organisations, a third is the open and collaborative culture about package development.
A standard library could also be a cool thing.

3 Likes

I am not a package developer myself, so I don’t understand how pruning can be a difficult problem. The most complicated case that I can think of would be the discussion that people were having in the HigherPrecision.jl post that you linked, where merging the two packages would have been quite complex. On the other hand, if package A is superior to package B, then as long as all parties agree, it would be quite easy to put warnings on package B and guide users towards package A. In fact, systems for doing the above task were discussed in the HigherPrecision.jl post.

As for the actual quality control, I like what @mkborregaard says, we need to get things right the first time. We need to make sure that we have a system in place that encourages new ideas while getting rid of bad ideas/duplicated effort.

My main point is that I don’t think that this problem requires any new technology (please correct me if I am wrong about that though). Unfortunately, I think that whatever solution we come up with will actually require the most precious resource of all, developer time. I guess that the good news is that most people will agree that quality control is an important problem, so maybe we do have the resources required to tackle this.

The problem with quality control is that someone has to have an opinion. There are always good choices, and it’s never the “obvious” cases that are a problem: it’s always the edge cases. An informed opinion isn’t bad, but informed opinions will differ. And any “opinion by committee” will never be quick or decisive.

I tend to think the way forward is with “different METADATAs”, or lists. A big giant METADATA which requires the bare minimum (like what we have now, but with better pruning of dead packages) is an ideal “large pool” to choose from, but will never be something that is “fully trustworthy”. I think that small groups of experts with clear judging criteria could easily curate their own lists, and from there it’s like movies: you follow who you tend to agree with. Github stars somewhat follow the trajectory of the community, but they never die down which is why some deprecated packages can still appear used.

3 Likes

Will we then need a PATADATA to keep track of all of the accepted METADATAs?

Ok, but being serious, I do like that idea. However, isn’t it possible that multiple lists created from people with different interests and skills could lead to fracturing? We don’t want to make the same mistakes that Linux made.

1 Like

Well there’s an “opinionated pruning” problem that has to be solved. Take for example plotting. Winston.jl is fine, but would anyone actually recommend it to someone these days? That’s the kind of thing that would be hard for a larger inclusive committee to say no to, but it’s easy for a smaller group of individuals to just go: no I would not recommend that to newcomers because I think there’s much better options. When I want to jump into a new area, say linear programming, I appreciate it when experts take the time to curate lists and let their opinions weed out the cruft. I am not sure you can really have these kinds of tiers in a single repository that is made for bringing all working packages together, it just serves a different goal. These are two approaches are solving separate problems.

1 Like

I think communication among package authors and a deliberate philosophy of avoiding redundancy and achieving integration is the ideal process for this.

1 Like

I would like to make the argument that for a group of people the interest to program something isn’t motivated by the practical considerations. Starting out a project as a learning exercise or design experiment doesn’t imply you end up with a toy implementation. In other cases, it may just so happen that you don’t agree with some major design decisions of an established framework and are curious if you could pull it off differently.

The second reason why starting fresh is very appealing is because it makes a big difference in quality of life.

My personal experience over the past years has been that often I find some existing package related to my current interests where I would like to tweak or try a couple of things. Now of course in general these tweaks are not always a good idea, because often enough when you start out you lack the insight to judge specific design decisions that were made. But one has to start somewhere. So you start working on the code and create a PR and realize no one is responding. Add to that that for existing package one has to focus on small incremental changes that are easy to review, because people are busy and feedback can be sparse and slow. As a consequence you always feel like an intruder that has to inconvenience people.

With the major exception of the fantastic Images ecosystem (where Tim Holy regularly makes a big effort to provide hints an guidance), it turned out to be massively more fun, convenient, educating, and productive to just start from scratch and see for yourself what works and what doesn’t.

9 Likes

To bring it back to what @Tamas_Papp specifically referred to, and in light of the up and coming v1, I think that newcomers would rather not be confronted with

I therefore want to highlight that there are two levels of “needs” here:

  1. new users that just want to find a package that does X.
  2. experts looking for a (more exotic) method for a specific problem

While the solution for the second need is beyond me (the metadata idea was cool), I think that the solution for the first should be some handpicked list of awesome packages. I agree that the threshold of what is exotic and what is commonplace might not be easy to find, but we could start with something and adjust it as we go. If tons of new users are asking for the standard Julia package for Analog Circuit Modeling and Emulation, then we’ll have to add ACME.jl to that curated list (I litteraly picked the first package from pkg.julialang.org just as an example).

There is something very comforting in using Matlab’s included toolboxes – you know it’s probably the best solution for that specific task – and it’s only when you need some extra functionality that you go looking for alternative solutions in fileexchange. I’m obviously not saying that we should include that curated list in Base, but having an initial layer of officially standard packages would be useful for newcomers. Again, the solution for the second layer might be a lot more complex.

1 Like

There is, of course,

  1. “industry” users are looking for long-time-support packages with stable api.
3 Likes

I tend to think that’s too inclusive to be useful though, with many deprecated packages in the list.

3 Likes

I think people are shy about submitting PR’s that remove someone else’s package from the list, but it should be done at some point.

1 Like

Thanks for bringing this up. I was also thinking recently about packages and how to find them and how to make sense of them (see the thread What can we do to make Julia grow fast? - #12 by PetrKryslUCSD and the recently started Is there any interest in creating a FEM Julia organization? - #12 by PetrKryslUCSD).

I believe that organizations as groupings of packages of similar nature are an organic way of organizing the ecosystem. When one is interested in finding whether there is a package does this and that it is effective to start with a brief list of organizations, identify the organization that deals with the subject, and then drill down the hierarchy to find out which packages are available. The current list of packages on the Julia website is not very helpful, especially given the sometimes quirky and terse descriptions…

Very high-level response here, but Pkg3 is entirely federated and supports multiple registries, including private ones. Once we’ve moved to this in Julia 0.7/1.0, the current METADATA will be automatically migrated to an “uncurated” registry and we will create a new “curated” registry that will be more carefully vetted; “uncurated” will accordingly be more liberal than METADATA has been. This is somewhat similar to Haskell’s “hackage” vs “stackage” arrangement. I think that one of the criteria for “curated” should probably be that there be one true version of each data type and/or algorithm. If there are multiple competing things, we’ll have to get everyone in cooperation and agreement. Fortunately, multiple dispatch makes this fairly doable in a way that single dispatch OO fails to.

15 Likes