Fixing Package Fragmentation

In general, I think it’s nice to avoid polluting the registry with half-baked projects. I’ve definitely done this before when getting over excited about something (add it to the list of things I lie awake at night thinking about). I think there’s a stronger case to be made for avoiding registering packages that are clearly just a minimal implementation of some concept to get an idea out there. Those sorts of projects tend to change over a week when people start giving input.

I’m not trying to make a push for or against anything here. Just sharing my personal experience with this.


My feeling is only against name clashes. Ideally I would like that we could prefix package names by organization, as for example

using SciML/DifferentialEquations

Something like that would allow organizations to gain reputation and give reputation to packages, without introducing barriers for new contributors (they already can do that, but less explicitly)

I am imagining that there could be even name clashes when the package is registered bound to an organization. For instance someone could register something that would be refered as, for example,

using MyResearchGroup/DifferentialEquations

In the long run it is probably unavoidable that a lot, if not most, of packages in the register will be deprecated. It is sad that the prettier names will be taken.


Because of UUIDs it might be possible to recycle names eventually.

I think organization prefixes might more readily belong in the environment space. I’m not sure how differentiates two packages with the same name but different UUIDs.

If a package is officially abandoned one can use the same UUID and release a breaking version. That has happened already, I remember seeing one case. But it doesn’t feel that can happen safely except on rare cases.

I meant the another case. Imagine a long deprecated package that no ones uses anymore called VeryCoolPackage.jl, so we deregister it after ten years after recording the last dependent.

Later somone new creates a new package called very cool package with a new UUID. This should not be a problem since older projects could still reference the old package by the original UUID if needed.

As merely an (irregular) user, package fragmentation can make it confusing to search for packages. My suspicion is that the overall ecosystem could be healthier overall (=larger bus factor) if there were stronger tendencies to merge efforts instead of creating separate packages.
However, given that Julia and its ecosystem are quite “academic-affiliated” in nature, I suspect the incentives are a little aligned against that – I think it’s easier to get published when creating your own separate package than for “look, I made a series of big PRs against otherpackage”. :person_shrugging:

One example that I recently encountered was that I stumbled upon GitHub - SciML/GlobalSensitivity.jl: Robust, Fast, and Parallel Global Sensitivity Analysis (GSA) in Julia. It uses the “GSA” in its Readme, which I assumed to stand for “Global Sensitivity Analysis”.
Getting curious about the package name, and searching, I found GitHub - lrennels/GlobalSensitivityAnalysis.jl: Julia implementations of global sensitivity analysis methods., so that seems to have occupied a clearer package name already, so from here it looks like the GlobalSensitivity.jl authors were probably aware of the other package. GlobalSensitivityAnalysis.jl exists for one year longer, and has a similar amount of commits and activity. I’m wondered why a separate package was created, instead of pooling efforts.

From skimming the docs, it seems the SciML one has more methods implemented. No indication why those could not have been added as PRs to the already existing packages. I could not find any indication of an effort to join forces. The SciML one has a published paper, but its “Statement of Need” does not mention the existence of the other package.

At this point I deferred a deeper trade-off analysis until I really needed such a package. I walked away wondering if the maintenance situation (in the always maintainer-strapped OSS world) would not be better if people were joining efforts more (so yeah, this topic). If there were gentle incentives from the ecosystem/culture towards that, I guess this would not be a bad thing.
But then, the real world is complicated and full of humans. From another ecosystem and software area that is littered with small and subsequently abandoned projects, I already know that achieved de-fragmentation is a very hard thing. Maybe getting critical mass orgs (=>SciML?) and ending up dominating is the most promising avenue?


The SciML one has a lot longer history than the other one. It started in SciMLSensitivity (DiffEqSensitivity) years before the other (2017). A few years later when this one popped up Vaibhav asked why (Existing implementations of GSA in DiffEqSensitivity.jl · Issue #37 · lrennels/GlobalSensitivityAnalysis.jl · GitHub) and there wasn’t really much of an answer :person_shrugging: . I don’t see any major harm to such a thing exist.

I don’t think anyone cares. The vast majority of open source packages exist without a publication and no intention to be published. You see this same thing even in other OSS communities where there is much less of an academic focus. Sometimes people just do it as a hobby, and sometimes that hobby I guess is to just make one basic algorithm.


Thanks for the context! For a a passer-by it’s hard to make an accurate judgement.

1 Like

You may have a point, but it’s probably not as bad as it’s sound. First, as already mentioned, same phenomenon occurs in OSS (as said by ChrisRackauckas above). Then, because it’s not so common (… but happens). But also, because I see this as an opportunity to experiment with new solutions, before maybe merging it to a more popular package. The same is done for Julia itself, where some experiments are first “externalized” before being brought into Base. So this may be more beneficial in the end.

But for the user that “just want a package”, this is understandably quite confusing (I had a bad experience the first time it tried to plot something, a few years ago). And there I think it is the maintainer’s responsibility to mention similar packages, in the same vein as exposed by jlapeyre above.

Nice idea. I wonder if a simple solution to do so would be to have the possibility to define “alias” for package names? (as the UUDI is what matters)

More generally, I have the same sort of issue with function names in Julia, where I wish there were more use of namespaces to facilitate discoverability. This may be a psychological thing (easier to remember things when organized in a tree-like structure instead of flat namespace), or maybe it’s just me …

1 Like

I don’t see why that is a problem. A package can be in multiple registries.

More generally, as others have remarked: this is not the first topic about the issue (which can be phrased as package fragmentation, quality control, deprecation of abandoned packages — these are all related). The issue is recognized, but it is also understood that any solution requires a lot of work, mostly from people who are not directly incentivized to do it.

Viable proposals need to address who and why they would do this work, or better yet, actually start doing it on a small scale and demonstrate feasibility and benefits. A curated registry is probably the least effort solution, but I am not aware of any currently maintained efforts (JuliaPro had a registry, but I cannot find it at the moment — was it abandonned?).

I’ve not seen an official announcement but was told this on Slack, and Sebastian confirmed as much some time ago:

1 Like

These seem to be looking for a technical solution where the first-order issue is social. The most important sentence to me in the Lisp Curse essay is: “Lisp is so powerful that problems which are technical issues in other programming languages are social issues in Lisp.” We don’t need separate registries, or rules for removing old packages… we just need some mechanism to coordinate on having at least one package for key problem domains that works well and is supported. Everything else can be anarchy in a way that would make lisp hackers proud.

Right now the issue isn’t discoverability, it is that there is no baseline package which works throughout a significant chunk of the ecosystem (outside of notable exceptions such as differential equations, DSL-oriented optimization, etc.)

This is the only way to solve the problem. People who have expertise and don’t bite off more than they can chew take ownership of an ecosystem. They can only do this by applying for grants (or corporate sponsorship) like was done in the python world. Jupyter/scipy/numpy had massive funding because people decided to make managing building software their job (and in the case of many of them, they didn’t have research requirements, to my knowledge).


What are major areas without a “baseline package”? It would be useful to gather them in one place, I think.
I’ve seen graphs and keyed arrays mentioned, anything else?

1 Like

I’ve seen graphs and keyed arrays mentioned, anything else?

  • Testing frameworks
  • Tools for building / viewing / linting documentation

I suppose there are “baseline” packages somewhat, but there are competitors which are nearly as popular

Sadly, everyone would have a different answer, which is why it is so useful to build out from conquering one domain at a time. @ChrisRackauckas has outlined a lot of them in his SciML docs consolidation - but SciML can’t personally manage writing all of them, just wrapping them and linking to their docs. Some of the packages that the SciML org is linking to and wrapping for nonlinear optimization and solves for systems of equations may well be the best option in Julia right now, but need a lot of work.

I also don’t think that has to come top-down in any way. Organizations can function to build that organically, by gaining momentum and reputation. Having a way to identify the package with the organization (using SciML/DifferentialEquations) would help a bit, but is not absolutely necessary. One nice thing of this approach would be that it would be in the mid term of having large bloated packages (numpy-style reference packages with all bound to it) and having too much fragmentation. Documentation for what the organization provides could/can guide what it provides, package-wise.

Having packages spread among multiple registers won’t help, that’s a move that Linux distributions took with PPAs and so on which only decreased the quality of the user experience. They needed that to allow for greater user flexibility to packages, but Julia pkg manager doesn’t have that limitation.

On the contrary, PPAs greatly enhance my Linux user experience, because they give me access to packages that are either to obscure or too modern for the main package registry, while making sure they fit into the package update/dependency flow (I.e. I get updates as normal). This means I don’t have to compile myself and/or manually update N different tools. This is generally hands-off and immensely valuable. E.g. it means I can use a properly deb-packaged Firefox from the mozillateam PPA and avoid the snap mess that Ubuntu pushes onto users.


I also used to believe that Lisp is super-powerful when I was using it. But it isn’t, really. Not when compared to Julia, and a lot of other languages. It is practically impossible to write portable, reusable, and fast code in Common Lisp the way you can routinely do in Julia.

I don’t think that the Julia package ecosystem is “fragmented” because Julia is, in some sense, too powerful, and this does not encourage cooperation. I see cooperation happening organically across the ecosystem — it is just that the process is, naturally, slow.

Many packages that seemingly address the same thing are actually very different. Take, for example, my favorite plotting packages, PGFPlotsX.jl: it works through the TeX package pgfplots, with all the advantages and disadvantages that implies. The former include very close integration with (La)TeX documents, in terms of content and style. I don’t think that this package can be meaningfully merged with any other plotting package.

I am not sure what you mean here. A lot of packages work fine with a significant part of the ecosystem, because Julia’s design allows a lot of more-or-less orthogonal composition.

Part of the problem may be user expectations: people coming from other languages may expect to see a big über-package doing everything under the sun, while in Julia they will encounter cooperating ecosystems, of which Tables.jl is an excellent example (there are many others). This may be confusing, because using Tables just gives you some minimal functionality, and conversely, for most functionality you will not be using Tables.

I can easily imagine this being very confusing to an R or Python user. But the solution is just learning the ways of Julia.


Yes, true, I didn´t phrase correctly. What I tried to convey is that PPAs are sort of a patch to the fact that the distros cannot follow the pace of package development, and that having to handle multiple repositories is not nice. I wouldn’t like if people started to create custom package registries and we as users started to have to handle them manually.