But that relies on luck, which I’m saying a (well run) org can help limit. Organizations like SciML can take scope of what’s going on and try to be proactive rather than reactive. For example, one of the big issues in the last year was that we had lost contact with Kirill, maintainer of NeuralPDE, due to world political events. Because of this and the interest in the library, I gave the library a bit of a refresh, moved a bit of CZI funds to get a new maintainer on there, and focused 4 of the GSoC projects towards this library in hopes of training the next batch of maintainers. This kind of targeted action doesn’t tend to happen without some kind of structure behind it. I guess a professor can specifically look for a new PhD student whose interests align with a library that has been left behind, but it’s much easier to have a larger pool of resources.
There is 0% luck involved in forking a project, it always works Of course, I see the value in being affiliated with a well funded org but I am just saying that no one should feel pressured to join an org. A GitHub org is just a group of 1 or more contributors. So it’s not the org that matters, it’s the people and money behind the org that make the difference.
And it takes a lot of knowledge. I don’t think it’s even in general possible. This is why I chose to do SciML and I’m pretty adamant about not extending that to “normal ML” or data science. Those are outside my field of expertise. I won’t do as well if I try to do that. I’m not saying Julia shouldn’t have someone do something similar for deep learning or data science tooling, I’m saying that there’s more than enough on my plate already (there’s more than a few libraries in SciML I am not happy with) and someone else should take those domains.
R has CRAN Task Views of this form with domain-specific authors:
I think there’s a lot to learn from that.
We’ve had this discussion (more focused on statistics) already:
Most of this has been touched on before but that doesn’t mean it’s not worth revisiting with renewed energy and direction; but I’m not sure how helpful it is to try to address all of these problems that require different solutions in a single thread.
If it make anyone feel any better, this did kick me into spending all night digging through my old code so I could fix some of my newbie mistakes made in ArrayInterface.jl and support very simple dimnames
and index_labels
methods like what you see in DataAPI.jl in the future.
Unfortunately, it’s a bit backward. Most of the packages that we’d like in our curated registry are already in the general registry. What would be more useful would be a copy of General with all the packages removed that have no readme and are only there to share a repo among members of a lab, none of who know how to maintain a local registry… OTOH, who knows, maybe maintaining a registry of what one considers relevant packages is worth trying despite the duplication.
Here are a couple of local, low-energy, things you can do.
To avoid putting things in the general registry prematurely:
I maintain a public registry (This is super easy with LocalRegistry.jl
) for packages that dont belong in or are not ready for General. I sometime have packages that depend on these, so the README says you have to install this registry. I haven’t seen anyone else do this (I’m sure someone has) I don’t how it would work out if it were more common to see smaller registries in use publicly.
This is obvious and well known, but I’ll repeat if for this thread. To aid in discoverability and aid in evaluating overlap and consolidation possibilities:
Like in a science paper, you can put a few sentences in the readme putting the package in context in the larger ecosystem. If you don’t want to spend the time, you can at least add a list of related packages. Or if you don’t want to spend even that time, you can do what I did with an enums package that I put in the general registry; add a link to one package and say something like “See EnumsX.jl and packages referenced in its README”. The last one is still kind of negligent, especially for the fourth or fifth enums package, but much better than saying nothing.
Whats the benefit of maintaining a personal registry for packages that are already in the general registry
What are the externalities of registering packages in the general registry? Should I feel bad about the dozens of packages I already registered?
In general, I think it’s nice to avoid polluting the registry with half-baked projects. I’ve definitely done this before when getting over excited about something (add it to the list of things I lie awake at night thinking about). I think there’s a stronger case to be made for avoiding registering packages that are clearly just a minimal implementation of some concept to get an idea out there. Those sorts of projects tend to change over a week when people start giving input.
I’m not trying to make a push for or against anything here. Just sharing my personal experience with this.
My feeling is only against name clashes. Ideally I would like that we could prefix package names by organization, as for example
using SciML/DifferentialEquations
Something like that would allow organizations to gain reputation and give reputation to packages, without introducing barriers for new contributors (they already can do that, but less explicitly)
I am imagining that there could be even name clashes when the package is registered bound to an organization. For instance someone could register something that would be refered as, for example,
using MyResearchGroup/DifferentialEquations
In the long run it is probably unavoidable that a lot, if not most, of packages in the register will be deprecated. It is sad that the prettier names will be taken.
Because of UUIDs it might be possible to recycle names eventually.
I think organization prefixes might more readily belong in the environment space. I’m not sure how differentiates two packages with the same name but different UUIDs.
If a package is officially abandoned one can use the same UUID and release a breaking version. That has happened already, I remember seeing one case. But it doesn’t feel that can happen safely except on rare cases.
I meant the another case. Imagine a long deprecated package that no ones uses anymore called VeryCoolPackage.jl, so we deregister it after ten years after recording the last dependent.
Later somone new creates a new package called very cool package with a new UUID. This should not be a problem since older projects could still reference the old package by the original UUID if needed.
As merely an (irregular) user, package fragmentation can make it confusing to search for packages. My suspicion is that the overall ecosystem could be healthier overall (=larger bus factor) if there were stronger tendencies to merge efforts instead of creating separate packages.
However, given that Julia and its ecosystem are quite “academic-affiliated” in nature, I suspect the incentives are a little aligned against that – I think it’s easier to get published when creating your own separate package than for “look, I made a series of big PRs against otherpackage”.
One example that I recently encountered was that I stumbled upon GitHub - SciML/GlobalSensitivity.jl: Robust, Fast, and Parallel Global Sensitivity Analysis (GSA) in Julia. It uses the “GSA” in its Readme, which I assumed to stand for “Global Sensitivity Analysis”.
Getting curious about the package name, and searching, I found GitHub - lrennels/GlobalSensitivityAnalysis.jl: Julia implementations of global sensitivity analysis methods., so that seems to have occupied a clearer package name already, so from here it looks like the GlobalSensitivity.jl authors were probably aware of the other package. GlobalSensitivityAnalysis.jl exists for one year longer, and has a similar amount of commits and activity. I’m wondered why a separate package was created, instead of pooling efforts.
From skimming the docs, it seems the SciML one has more methods implemented. No indication why those could not have been added as PRs to the already existing packages. I could not find any indication of an effort to join forces. The SciML one has a published paper, but its “Statement of Need” does not mention the existence of the other package.
At this point I deferred a deeper trade-off analysis until I really needed such a package. I walked away wondering if the maintenance situation (in the always maintainer-strapped OSS world) would not be better if people were joining efforts more (so yeah, this topic). If there were gentle incentives from the ecosystem/culture towards that, I guess this would not be a bad thing.
But then, the real world is complicated and full of humans. From another ecosystem and software area that is littered with small and subsequently abandoned projects, I already know that achieved de-fragmentation is a very hard thing. Maybe getting critical mass orgs (=>SciML?) and ending up dominating is the most promising avenue?
The SciML one has a lot longer history than the other one. It started in SciMLSensitivity (DiffEqSensitivity) years before the other (2017). A few years later when this one popped up Vaibhav asked why (Existing implementations of GSA in DiffEqSensitivity.jl · Issue #37 · lrennels/GlobalSensitivityAnalysis.jl · GitHub) and there wasn’t really much of an answer . I don’t see any major harm to such a thing exist.
I don’t think anyone cares. The vast majority of open source packages exist without a publication and no intention to be published. You see this same thing even in other OSS communities where there is much less of an academic focus. Sometimes people just do it as a hobby, and sometimes that hobby I guess is to just make one basic algorithm.
Thanks for the context! For a a passer-by it’s hard to make an accurate judgement.
You may have a point, but it’s probably not as bad as it’s sound. First, as already mentioned, same phenomenon occurs in OSS (as said by ChrisRackauckas above). Then, because it’s not so common (… but happens). But also, because I see this as an opportunity to experiment with new solutions, before maybe merging it to a more popular package. The same is done for Julia itself, where some experiments are first “externalized” before being brought into Base. So this may be more beneficial in the end.
But for the user that “just want a package”, this is understandably quite confusing (I had a bad experience the first time it tried to plot something, a few years ago). And there I think it is the maintainer’s responsibility to mention similar packages, in the same vein as exposed by jlapeyre above.
Nice idea. I wonder if a simple solution to do so would be to have the possibility to define “alias” for package names? (as the UUDI is what matters)
More generally, I have the same sort of issue with function names in Julia, where I wish there were more use of namespaces to facilitate discoverability. This may be a psychological thing (easier to remember things when organized in a tree-like structure instead of flat namespace), or maybe it’s just me …
I don’t see why that is a problem. A package can be in multiple registries.
More generally, as others have remarked: this is not the first topic about the issue (which can be phrased as package fragmentation, quality control, deprecation of abandoned packages — these are all related). The issue is recognized, but it is also understood that any solution requires a lot of work, mostly from people who are not directly incentivized to do it.
Viable proposals need to address who and why they would do this work, or better yet, actually start doing it on a small scale and demonstrate feasibility and benefits. A curated registry is probably the least effort solution, but I am not aware of any currently maintained efforts (JuliaPro had a registry, but I cannot find it at the moment — was it abandonned?).
I’ve not seen an official announcement but was told this on Slack, and Sebastian confirmed as much some time ago: