Fixing Package Fragmentation

I’d bet there’s a lot of “import numpy, scipy, matplotlib” types of people out there - I’m one of them. From what I’m seeing SciML is the Julia equivalent of Scipy, at least from a documentation standpoint, and does a pretty good job of consolidating a bunch of packages into one place where they can be easily discoverable.

I’m not aware of an equivalent document for the plotting ecosystem. Matplotlib is the standard in Python so it doesn’t have this issue. Plots seems to be the de facto default plotting package, Makie seems to be up and coming for that position, at least from what can be gauged from the SciML docs, and it would be nice to have all other plotting packages in one document to aid in discoverability. I get why Julia Base can’t endorse Plots in its manual, but it should still be clear enough to a beginner that Plots is the go-to package (or maybe Makie). Maybe the plotting section in the SciML docs can be expanded for this purpose, but I don’t know if it’s the ideal location for it.


I think it’s part and parcel to a discussion on how to “fix” fragmentation, honestly. There are many tacks that can be taken to “fix fragmentation” here. For example:

  • The General registry should have a higher bar and be more curated.
  • There should be a Cathedral/Bazaar model where there’s a curated registry alongside the general one.
  • There need to be more documentation efforts to unify ecosystems (a la numpy/scipy/SciML) or other such curated lists (perhaps in the style of awesome-X or the like).

All of them would “fix” fragmentation, but they are all very different approaches and they would all require significant work and buy-in. For example, PyPI has an even lower bar than Julia’s General registry, but they solve this with monolithic packages. SciML doesn’t use monolithic packages, but instead has monolithic documentation.

It’s worth looking at some previous discussions: How to know which Julia package to trust? or How to know if a package is good?.


This kind of feels like two separate issues:

  1. Big packages that are broken up into several small ones with perhaps a top level package re-exporting those methods. Most of the time a single or small team of contributors are maintaining all of these packages.

  2. Many packages that are trying to achieve similar functionality (plotting, data formats, interfaces, AD) that are mainly developed by different people with only somewhat overlapping user bases.

I must admit I particularly don’t like the first approach where I have to look through many sub-packages to see where methods are called. In that sense I prefer the monolithic approach given by SciPy. It sometimes feels like a larger issue I have with Julia packages is that it can be much easier for me to read libraries like BLIS and SciPy quickly and understand what is going on and where things are coming from than some Julia packages. In some sense, this (for me) creates a lot of friction contributing to some of these packages. This could perhaps be part of the reason for the second issue is it can be easier and more rewarding to work on your own packages. I don’t think that is a bad thing!

Now, the ecosystem has done this mainly as a means to reduce latency (pre v1.9) which has worked very well. I have considered breaking up Bessels.jl into many small sub-packages for different functions (Airy, Bessel, Gamma) but I have been resistant as it personally is much easier for me to develop in a big mono-repo. I also find that it is easier to maintain a single documentation, a single place to discuss issues, maintain CI, attract users as well as pool contributors. I don’t think that would be similar if this was scattered amongst many different packages. It also provides some stamp of implementation quality. Like if an implementation is in SciPy I am pretty sure that it has some level of quality and testing.

I think the release of v1.9 has opened many different possibilities. I of course realize Bessels.jl is pretty much the perfect example of the advantages of the new caching code. I have shown this before but I’ll give a new example with the recent stable release.

# Version 1.8.2 (2022-09-29)
julia> @time @eval using Bessels
  0.030072 seconds (45.18 k allocations: 4.904 MiB)

julia> @time @eval airyai(1.2 + 1.1im)
  0.528501 seconds (962.46 k allocations: 48.968 MiB, 2.77% gc time, 99.91% compilation time)

# Version 1.9.0 (2023-05-07)
julia> @time @eval using Bessels
  0.030401 seconds (34.75 k allocations: 3.267 MiB)

julia> @time @eval airyai(1.2 + 1.1im)
  0.000290 seconds (62 allocations: 3.125 KiB)

Pre v1.9 I was seriously considering the cost of adding new functions to each version to the user and how it affected package load times and time to first function evaluation. Now, I am really just considering how many functions I should explicitly precompile and their affect on cache file size. I believe a lot of effort on v1.10 and beyond might be on reducing these file sizes.

Now, I am also aware that users might only use one function out of hundreds and so they are also paying for all of the other functions and features we are slowly adding. We have moved to a module base approach (that I guess in the future someone could split into subpackages) that isolates dependencies and functions more similar to SciPy. I guess we kind of ended up at the first approach with many submodules that are re-exported in the end :man_shrugging:


I think this is not fully a symptom of monolithicness / fragmentation, but also the fact that many common include and using patterns make source code discoverability challenging (no matter the package size)


I agree with the sentiment that there are multiple solutions and problems we really want to tackle here, but the hardest one is probably the one that truly does necessitate consolidation.

Personally, the term “consolidation czar” sounds a bit silly but perhaps its underlying meaning is more pragmatic than how it sounds. I don’t think a single person should be responsible for this across Julia because I don’t believe a single person has the expertise across that many things to do so. On the other hand, discussions on combining efforts come to a stand still and having one person with a definitive statement can be useful. I’m thinking of a sort of BDFL for areas of expertise that is willing to moderate and make firm decisions at a certain point.

1 Like

55 posts were split to a new topic: Fixing labeled array package fragmentation

Indeed, I actually often find this much easier to dissect and diagnose when it’s multiple separate packages strung together rather than one big package with nested includes.

This is a really great idea. I am somewhat doing that with @dilumaluthge and JuliaHealth this summer at the JuliaHealth BoF we are co-running (shameless plug, come visit!) and one of the major things will be thinking around interfaces to be commonly used within JuliaHealth while also supporting the development of other packages that could fit in the JuliaHealth umbrella.


Are these going to be broadcasted or have some sort of Zoom link?

1 Like

People discover them… They just discover either all of them or a random draw from them, and most are uncomplete and poorly maintained.

Consolidation isn’t required, just a signal that something is good enough to rely on - and with a commitment to testing. That is what SciML is doing and among the many talents of that organization, hardcore testing and documentation is central.


I wish these two forks could be merged back into one, but I don’t have any ideas for how to make it actually happen.


These are the sort of decisions I absolutely hate working on. It’s mostly bike-shedding until you reach a reasonable conclusion, then you have to spend the next months-year explaining over and over why they are seeing deprecation warnings and didn’t use the special syntax they would prefer.


I have experience both in choosing to work together and in preferring to not work together. Perhaps sharing my experience can help shed light on some ways forward.

For the former, let’s take GitHub - gdalle/ImplicitDifferentiation.jl: Automatic differentiation of implicit functions and GitHub - ThummeTo/ForwardDiffChainRules.jl as examples. The core of these 2 packages started as features in my GitHub - JuliaNonconvex/NonconvexUtils.jl: Some convenient hacks when using Nonconvex.jl. and they were not very well tested or documented yet. The current owners of the 2 packages above reached out and expressed interest in starting a package for each of those features. I supported that and started contributing to their packages because why not? The 2 projects turned out great and ended up with more users, stars, and PRs. Everyone is happy. In this case, the key was the natural coalescing of interests and the willingness to give up a little bit of control over a package in exchange for more dev time to be put into the package. Today I don’t control any of these packages and I am fine with being just a contributor.

The second example is GitHub - JuliaNonconvex/Nonconvex.jl: Toolbox for non-convex constrained optimization. vs GitHub - SciML/Optimization.jl: Mathematical Optimization in Julia. Local, global, gradient-based and derivative-free. Linear, Quadratic, Convex, Mixed-Integer, and Nonlinear Optimization in one simple, fast, and differentiable interface.. Optimization.jl has more pull because it’s a SciML project and it follows the standard SciML API. Many people like that. I don’t and that’s ok. Both Nonconvex.jl and Optimization.jl had similar development timelines and I could have chosen to contribute to Optimization.jl and abandon Nonconvex.jl at any point but I chose not to. My reason is that I wanted to have the ability to experiment with new APIs, new optimisation algorithms and AD hacks and I wanted to have full control over the package. In this case, I wasn’t willing to relinquish control and my interest to try new things in Nonconvex.jl did not naturally align with the interests of the Optimization.jl devs to follow the SciML API. There were more differences between the 2 packages as well so I am over-simplifying things a little.

I think if we can learn anything from these 2 experiences, it’s that when we have limited resources, getting together may end up being good for everyone. But no one should feel pressured to contribute to another package instead of starting their own. Having more channels where people of similar interests can reach out and collaborate would be a great thing. Perhaps JuliaCon should not be just a yearly thing. If there is a JuliaCon every 2 months online with different themes, I will be happy to listen in. This may be a way to break silos and get more people to reach out and collaborate.


And that’s totally fine! I think the two are moving in different directions and will complement each other.

The reason for not including Plots.jl is actually technical. There’s a redirect on its main page that puts you into the wrong site. It’s from a very early version of Documenter and I just haven’t patched that in the gh-pages of Plots.jl yet.

I think in general that’s fine, but I also think that in practice people don’t really understand the effort and patience required to maintain a package over years. I do see some packages that are nice small personal projects that I’m like “please put this in an org so it’s still up to date 4 years from now”. Simple things like bumping dependencies, reporting upstream regressions, continuing benchmarks, etc., It’s not hard but we do need to ensure that the satellite projects do keep that up. I think some people don’t move something into an org because they think it will look good on a CV or publication to be some personal project, but… you can publish and we can still maintain it :sweat_smile:. Its this unsexy work, continuing to maintain and improve the documentation, answering on Discourse and keeping an FAQ that grows over time, that’s the stuff that makes a package eventually mature, and we should be a bit more diligent about helping especially graduate students understand this longer term process.

This is why I’m so adamant that all GSoC projects are in orgs with other maintainers. Some day, maybe 10 years from now, someone may get bored or just busy, and when that day comes, having a structure to train the next maintainer is essential.

But then again, not every great project needs a large org and a lab. A lot of projects are perfectly fine as personal projects with one professor maintaining it for years.


I understand that but an author can also just add more contributors to the project with merge rights. Worst case scenario, an org can fork a project (or copy-paste it and give credit) if a project is abandoned but is still important to an org. So there is more than one solution to this problem. Educating people about maintenance burden is definitely beneficial though. Maybe everyone who wants to register a package can be asked to watch a video or read an article explaining what it takes to maintain a package long-term.

If there are too many abandoned packages, we may consider having a formal process of unregistering packages from the General registry. Another suggestion to “solve” the fragmentation issue for users is for some people to maintain a new registry. If I maintain a “SuperHighQualityRegistry” which has only packages that I deem to be of high quality, and if people trust my opinion enough, this registry may become popular. Then I can have a monolith documentation for all of the packages in my registry. Creating a new registry in Julia today is almost as easy as creating a new crypto, more people should do it!


My response was flippant, because I consider the proposal (especially the part about forcing merges, or making it more difficult to create new packages) absurd.

But I should have explained that in detail, instead of resorting to sarcasm. I will do that now.

Similar packages addressing the same functionality usually exist because even if they provide similar functionality, the trade-offs between code complexity, speed, and generality are addressed differently. Eg to read tabular data, you have DelimitedFiles.jl, recently uncoupled from Julia, and CSV.jl, and a couple of other packages. The first one is simple and not optimized for large data, the second is quite complex and features lazy reading and a very fast parser. They do the same thing, more or less, but they do it very differently. The first one should be very easy to contribute to, as the code is quite simple, but the second one would require a much larger initial investment in understanding the code.

The same applies to AD libraries. There have been more than 10 experiments so far with reverse-mode AD. Each led to knowledge that was used by the authors of the next (frequently the same people), yet not all of them were abandonned since we still do not have a robust fire-and-forget reverse mode AD solution. There is no meaningful way these packages can be “merged”; perhaps at some point one solution will emerge as dominant, and the rest will fade a bit, but since existing code uses them it still makes sense to fix some issues, so they will be around for a while.

Consolidation does happen in the Julia ecosystem, but it is a slow process, and usually involves a lot of work. Simply appointing a “czar” is not going to make this magically happen any faster. When you have packages that are functioning OK and each has a set of users, consolidation requires that their code is refactored, imposing a cost on them (unless, of course, they pin versions, but then no updates). Or a unified API layer package, or some other similar solution. There is no free lunch here.

At the same time, code that is independently useful is frequently factored out to smaller packages, which is a good thing. This may look like “fragmentation”, but it usually makes code easier to maintain and improve. An example is LogExpFunctions.jl, which was factored out from StatsFuns.jl about two years ago, and received many high-quality PRs since.

People who think that the community should restrict the General registry in any manner (beyond the existing minimal requirements) may be missing the fact that it is very easy to start your own registry. If some users are willing to maintain a registry of curated packages, they can do so today, without any hassle.


That’s really a GREAT idea!

But that relies on luck, which I’m saying a (well run) org can help limit. Organizations like SciML can take scope of what’s going on and try to be proactive rather than reactive. For example, one of the big issues in the last year was that we had lost contact with Kirill, maintainer of NeuralPDE, due to world political events. Because of this and the interest in the library, I gave the library a bit of a refresh, moved a bit of CZI funds to get a new maintainer on there, and focused 4 of the GSoC projects towards this library in hopes of training the next batch of maintainers. This kind of targeted action doesn’t tend to happen without some kind of structure behind it. I guess a professor can specifically look for a new PhD student whose interests align with a library that has been left behind, but it’s much easier to have a larger pool of resources.


There is 0% luck involved in forking a project, it always works :sweat_smile: Of course, I see the value in being affiliated with a well funded org but I am just saying that no one should feel pressured to join an org. A GitHub org is just a group of 1 or more contributors. So it’s not the org that matters, it’s the people and money behind the org that make the difference.

1 Like

And it takes a lot of knowledge. I don’t think it’s even in general possible. This is why I chose to do SciML and I’m pretty adamant about not extending that to “normal ML” or data science. Those are outside my field of expertise. I won’t do as well if I try to do that. I’m not saying Julia shouldn’t have someone do something similar for deep learning or data science tooling, I’m saying that there’s more than enough on my plate already (there’s more than a few libraries in SciML I am not happy with) and someone else should take those domains.

R has CRAN Task Views of this form with domain-specific authors:

I think there’s a lot to learn from that.