Fixing Package Fragmentation

My response was flippant, because I consider the proposal (especially the part about forcing merges, or making it more difficult to create new packages) absurd.

But I should have explained that in detail, instead of resorting to sarcasm. I will do that now.

Similar packages addressing the same functionality usually exist because even if they provide similar functionality, the trade-offs between code complexity, speed, and generality are addressed differently. Eg to read tabular data, you have DelimitedFiles.jl, recently uncoupled from Julia, and CSV.jl, and a couple of other packages. The first one is simple and not optimized for large data, the second is quite complex and features lazy reading and a very fast parser. They do the same thing, more or less, but they do it very differently. The first one should be very easy to contribute to, as the code is quite simple, but the second one would require a much larger initial investment in understanding the code.

The same applies to AD libraries. There have been more than 10 experiments so far with reverse-mode AD. Each led to knowledge that was used by the authors of the next (frequently the same people), yet not all of them were abandonned since we still do not have a robust fire-and-forget reverse mode AD solution. There is no meaningful way these packages can be “merged”; perhaps at some point one solution will emerge as dominant, and the rest will fade a bit, but since existing code uses them it still makes sense to fix some issues, so they will be around for a while.

Consolidation does happen in the Julia ecosystem, but it is a slow process, and usually involves a lot of work. Simply appointing a “czar” is not going to make this magically happen any faster. When you have packages that are functioning OK and each has a set of users, consolidation requires that their code is refactored, imposing a cost on them (unless, of course, they pin versions, but then no updates). Or a unified API layer package, or some other similar solution. There is no free lunch here.

At the same time, code that is independently useful is frequently factored out to smaller packages, which is a good thing. This may look like “fragmentation”, but it usually makes code easier to maintain and improve. An example is LogExpFunctions.jl, which was factored out from StatsFuns.jl about two years ago, and received many high-quality PRs since.

People who think that the community should restrict the General registry in any manner (beyond the existing minimal requirements) may be missing the fact that it is very easy to start your own registry. If some users are willing to maintain a registry of curated packages, they can do so today, without any hassle.

31 Likes