Fixing Package Fragmentation

The SciML one has a lot longer history than the other one. It started in SciMLSensitivity (DiffEqSensitivity) years before the other (2017). A few years later when this one popped up Vaibhav asked why (Existing implementations of GSA in DiffEqSensitivity.jl · Issue #37 · lrennels/GlobalSensitivityAnalysis.jl · GitHub) and there wasn’t really much of an answer :person_shrugging: . I don’t see any major harm to such a thing exist.

I don’t think anyone cares. The vast majority of open source packages exist without a publication and no intention to be published. You see this same thing even in other OSS communities where there is much less of an academic focus. Sometimes people just do it as a hobby, and sometimes that hobby I guess is to just make one basic algorithm.

6 Likes

Thanks for the context! For a a passer-by it’s hard to make an accurate judgement.

1 Like

You may have a point, but it’s probably not as bad as it’s sound. First, as already mentioned, same phenomenon occurs in OSS (as said by ChrisRackauckas above). Then, because it’s not so common (… but happens). But also, because I see this as an opportunity to experiment with new solutions, before maybe merging it to a more popular package. The same is done for Julia itself, where some experiments are first “externalized” before being brought into Base. So this may be more beneficial in the end.

But for the user that “just want a package”, this is understandably quite confusing (I had a bad experience the first time it tried to plot something, a few years ago). And there I think it is the maintainer’s responsibility to mention similar packages, in the same vein as exposed by jlapeyre above.

Nice idea. I wonder if a simple solution to do so would be to have the possibility to define “alias” for package names? (as the UUDI is what matters)

More generally, I have the same sort of issue with function names in Julia, where I wish there were more use of namespaces to facilitate discoverability. This may be a psychological thing (easier to remember things when organized in a tree-like structure instead of flat namespace), or maybe it’s just me …

1 Like

I don’t see why that is a problem. A package can be in multiple registries.

More generally, as others have remarked: this is not the first topic about the issue (which can be phrased as package fragmentation, quality control, deprecation of abandoned packages — these are all related). The issue is recognized, but it is also understood that any solution requires a lot of work, mostly from people who are not directly incentivized to do it.

Viable proposals need to address who and why they would do this work, or better yet, actually start doing it on a small scale and demonstrate feasibility and benefits. A curated registry is probably the least effort solution, but I am not aware of any currently maintained efforts (JuliaPro had a registry, but I cannot find it at the moment — was it abandonned?).

I’ve not seen an official announcement but was told this on Slack, and Sebastian confirmed as much some time ago:

1 Like

These seem to be looking for a technical solution where the first-order issue is social. The most important sentence to me in the Lisp Curse essay is: “Lisp is so powerful that problems which are technical issues in other programming languages are social issues in Lisp.” We don’t need separate registries, or rules for removing old packages… we just need some mechanism to coordinate on having at least one package for key problem domains that works well and is supported. Everything else can be anarchy in a way that would make lisp hackers proud.

Right now the issue isn’t discoverability, it is that there is no baseline package which works throughout a significant chunk of the ecosystem (outside of notable exceptions such as differential equations, DSL-oriented optimization, etc.)

This is the only way to solve the problem. People who have expertise and don’t bite off more than they can chew take ownership of an ecosystem. They can only do this by applying for grants (or corporate sponsorship) like was done in the python world. Jupyter/scipy/numpy had massive funding because people decided to make managing building software their job (and in the case of many of them, they didn’t have research requirements, to my knowledge).

3 Likes

What are major areas without a “baseline package”? It would be useful to gather them in one place, I think.
I’ve seen graphs and keyed arrays mentioned, anything else?

1 Like

I’ve seen graphs and keyed arrays mentioned, anything else?

  • Testing frameworks
  • Tools for building / viewing / linting documentation

I suppose there are “baseline” packages somewhat, but there are competitors which are nearly as popular

Sadly, everyone would have a different answer, which is why it is so useful to build out from conquering one domain at a time. @ChrisRackauckas has outlined a lot of them in his SciML docs consolidation - but SciML can’t personally manage writing all of them, just wrapping them and linking to their docs. Some of the packages that the SciML org is linking to and wrapping for nonlinear optimization and solves for systems of equations may well be the best option in Julia right now, but need a lot of work.

I also don’t think that has to come top-down in any way. Organizations can function to build that organically, by gaining momentum and reputation. Having a way to identify the package with the organization (using SciML/DifferentialEquations) would help a bit, but is not absolutely necessary. One nice thing of this approach would be that it would be in the mid term of having large bloated packages (numpy-style reference packages with all bound to it) and having too much fragmentation. Documentation for what the organization provides could/can guide what it provides, package-wise.

Having packages spread among multiple registers won’t help, that’s a move that Linux distributions took with PPAs and so on which only decreased the quality of the user experience. They needed that to allow for greater user flexibility to packages, but Julia pkg manager doesn’t have that limitation.

1 Like

On the contrary, PPAs greatly enhance my Linux user experience, because they give me access to packages that are either to obscure or too modern for the main package registry, while making sure they fit into the package update/dependency flow (I.e. I get updates as normal). This means I don’t have to compile myself and/or manually update N different tools. This is generally hands-off and immensely valuable. E.g. it means I can use a properly deb-packaged Firefox from the mozillateam PPA and avoid the snap mess that Ubuntu pushes onto users.

2 Likes

I also used to believe that Lisp is super-powerful when I was using it. But it isn’t, really. Not when compared to Julia, and a lot of other languages. It is practically impossible to write portable, reusable, and fast code in Common Lisp the way you can routinely do in Julia.

I don’t think that the Julia package ecosystem is “fragmented” because Julia is, in some sense, too powerful, and this does not encourage cooperation. I see cooperation happening organically across the ecosystem — it is just that the process is, naturally, slow.

Many packages that seemingly address the same thing are actually very different. Take, for example, my favorite plotting packages, PGFPlotsX.jl: it works through the TeX package pgfplots, with all the advantages and disadvantages that implies. The former include very close integration with (La)TeX documents, in terms of content and style. I don’t think that this package can be meaningfully merged with any other plotting package.

I am not sure what you mean here. A lot of packages work fine with a significant part of the ecosystem, because Julia’s design allows a lot of more-or-less orthogonal composition.

Part of the problem may be user expectations: people coming from other languages may expect to see a big über-package doing everything under the sun, while in Julia they will encounter cooperating ecosystems, of which Tables.jl is an excellent example (there are many others). This may be confusing, because using Tables just gives you some minimal functionality, and conversely, for most functionality you will not be using Tables.

I can easily imagine this being very confusing to an R or Python user. But the solution is just learning the ways of Julia.

13 Likes

Yes, true, I didn´t phrase correctly. What I tried to convey is that PPAs are sort of a patch to the fact that the distros cannot follow the pace of package development, and that having to handle multiple repositories is not nice. I wouldn’t like if people started to create custom package registries and we as users started to have to handle them manually.

Could the discovery problem be mitigated if the project.toml file included a section listing functionality (based on agreed academic/industry terms) and a web page like juliapackages included a feature comparison like mesamatrix has?

It would help users to choose the most suitable package for their application and developers to find similar packages so as not to duplicate efforts.

I’ve read many papers that said “this algorithm is available as a part of X OSS library”, in areas such as computational geometry (many papers citing CGAL come to mind) or finite elements, where massive libraries tend to build up over time with many collaborators.

I agree. Actually, in Python you need uber packages to do everything because doing it in the base language is extremely slow. In Julia, we can do a lot by just using Base – for example, we absolutely don’t need something like numpy. I also believe it would be superflous to have something like SciPy – which unifies interpolation, curve fitting, FFTs, etc, and that it only exists in Python for historical reasons.

As for the state of plotting packages, I believe with the improvements in TTFP in Julia 1.9 and the fact that Makie has funding (crucial for such an ambitious package), in not such a long period of time it should become the defacto standard. It already has way more features than Plots.jl, it only has a little bit more latency and is a little harder to use, but let’s just give it a year or two.

So where is the package fragmentation problem exactly? AD? It would be good to know.

4 Likes

I do wonder if people are mostly looking for (a) package recommendations and (b) recommended packages being highly mature. I haven’t seen as many strong attempts at (a).

1 Like

Indeed. Perhaps an explanation of a concrete problem would be easier to act on than the abstract diagnosis of “package fragmentation”.

The current state of Julia AD is like having several vehicles in your backyard, including bicycles, tugboats, trucks, airplanes and experimental space rockets. Yes, each solves the problem of transportation, broadly, but they are hardly replacements for each other.

Practical, performant reverse or mixed mode AD is a difficult problem. The more powerful and complex the language, the more difficult. And Julia is not a simple language by any means.

7 Likes

I am an old-school Ada programmer and decided to test out the Julia waters when I saw what ChatGPT could do in jump-starting my evaluation. Reading this thread, I appreciate the fragmentation issue, as ChatGPT would generate code for different Julia libraries given slightly different context in the prompt. Still, I was amazed at how well it worked in doing neural nets (with Flux, and DataFrame for data) and the plotting library, and I was eventually only stopped when I tried to generate code for 3D articulation animation. I have to get back to it but I think it finally hit some deeper package conflict.

2 Likes

I have to teach in Python, and using numpy/scipy as an example of lack of fragmentation is really funny to me. You mean the package that splits the entire language into two halves with different idioms and syntax?

12 Likes