Fixing Package Fragmentation

I also used to believe that Lisp is super-powerful when I was using it. But it isn’t, really. Not when compared to Julia, and a lot of other languages. It is practically impossible to write portable, reusable, and fast code in Common Lisp the way you can routinely do in Julia.

I don’t think that the Julia package ecosystem is “fragmented” because Julia is, in some sense, too powerful, and this does not encourage cooperation. I see cooperation happening organically across the ecosystem — it is just that the process is, naturally, slow.

Many packages that seemingly address the same thing are actually very different. Take, for example, my favorite plotting packages, PGFPlotsX.jl: it works through the TeX package pgfplots, with all the advantages and disadvantages that implies. The former include very close integration with (La)TeX documents, in terms of content and style. I don’t think that this package can be meaningfully merged with any other plotting package.

I am not sure what you mean here. A lot of packages work fine with a significant part of the ecosystem, because Julia’s design allows a lot of more-or-less orthogonal composition.

Part of the problem may be user expectations: people coming from other languages may expect to see a big über-package doing everything under the sun, while in Julia they will encounter cooperating ecosystems, of which Tables.jl is an excellent example (there are many others). This may be confusing, because using Tables just gives you some minimal functionality, and conversely, for most functionality you will not be using Tables.

I can easily imagine this being very confusing to an R or Python user. But the solution is just learning the ways of Julia.

12 Likes

Yes, true, I didn´t phrase correctly. What I tried to convey is that PPAs are sort of a patch to the fact that the distros cannot follow the pace of package development, and that having to handle multiple repositories is not nice. I wouldn’t like if people started to create custom package registries and we as users started to have to handle them manually.

Could the discovery problem be mitigated if the project.toml file included a section listing functionality (based on agreed academic/industry terms) and a web page like juliapackages included a feature comparison like mesamatrix has?

It would help users to choose the most suitable package for their application and developers to find similar packages so as not to duplicate efforts.

I’ve read many papers that said “this algorithm is available as a part of X OSS library”, in areas such as computational geometry (many papers citing CGAL come to mind) or finite elements, where massive libraries tend to build up over time with many collaborators.

I agree. Actually, in Python you need uber packages to do everything because doing it in the base language is extremely slow. In Julia, we can do a lot by just using Base – for example, we absolutely don’t need something like numpy. I also believe it would be superflous to have something like SciPy – which unifies interpolation, curve fitting, FFTs, etc, and that it only exists in Python for historical reasons.

As for the state of plotting packages, I believe with the improvements in TTFP in Julia 1.9 and the fact that Makie has funding (crucial for such an ambitious package), in not such a long period of time it should become the defacto standard. It already has way more features than Plots.jl, it only has a little bit more latency and is a little harder to use, but let’s just give it a year or two.

So where is the package fragmentation problem exactly? AD? It would be good to know.

4 Likes

I do wonder if people are mostly looking for (a) package recommendations and (b) recommended packages being highly mature. I haven’t seen as many strong attempts at (a).

1 Like

Indeed. Perhaps an explanation of a concrete problem would be easier to act on than the abstract diagnosis of “package fragmentation”.

The current state of Julia AD is like having several vehicles in your backyard, including bicycles, tugboats, trucks, airplanes and experimental space rockets. Yes, each solves the problem of transportation, broadly, but they are hardly replacements for each other.

Practical, performant reverse or mixed mode AD is a difficult problem. The more powerful and complex the language, the more difficult. And Julia is not a simple language by any means.

7 Likes

I am an old-school Ada programmer and decided to test out the Julia waters when I saw what ChatGPT could do in jump-starting my evaluation. Reading this thread, I appreciate the fragmentation issue, as ChatGPT would generate code for different Julia libraries given slightly different context in the prompt. Still, I was amazed at how well it worked in doing neural nets (with Flux, and DataFrame for data) and the plotting library, and I was eventually only stopped when I tried to generate code for 3D articulation animation. I have to get back to it but I think it finally hit some deeper package conflict.

2 Likes

I have to teach in Python, and using numpy/scipy as an example of lack of fragmentation is really funny to me. You mean the package that splits the entire language into two halves with different idioms and syntax?

12 Likes

How could the Julia community help developers avoid duplication of efforts in future packages?

In case some developers want to write a package similar to a current one, how could the Julia community help them to make this package as compatible as possible with current packages?

1 Like

That does sometimes happen upon registering new packages. Some folks do keep tabs on the new packages being registered and ask questions like these. Check out the new packages label on the General registry — and see, for example, ones where someone has talked about something being “different” (often appearing as “how is this different from X”?).

2 Likes

Could the Julia community foster a culture of explaining the usefulness of packages when someone plans to do it rather than when someone plans to register it?

For example, the forum could have a section or thread to discuss ideas for new packages so that developers can get guidance from more knowledgeable developers before starting the package.

4 Likes

Users already ask what is available to solve their particular problem here in this forum.
That also goes for package developers, but in that case they have often done a thorough search
for existing functionality at that point.

I personally think that there is no problem with package fragmentation: If you’d like to write your own, why don’t you? Just make sure that potential users understand the tradeoffs between your package and its competition. (I.e., write a README…)

5 Likes

In the middle of this discussion I also think that we are forgetting the personal factor of open source development. Two people who don’t get along may have different opinions and may choose to spend their time implementing opinionated versions of the same package. What’s wrong with that? In open source, you can’t tell people what to spend time on unless you pay them. So we have to keep this in mind. Also a GitHub org is not the answer to everything. Some orgs have a BDFL. Not everyone is comfortable making someone else a BDFL of their work. Moving a package to an org is a kind of irreversible decision to make someone else have more say over your code than you even if they choose not to exercise this power out of courtesy. So any “policy” we come up with should respect this personal side of open source development.

15 Likes

Wouldn’t it be better to split this thread into several: one for how to avoid further fragmentation with future packages and another for each possible merge of similar packages?

It is a complex subject with many aspects, and each should deserve its own thread.

1 Like

We don’t need a thread collecting packages that can be merged. If you identify two packages you think can be merged, raise that discussion.

8 Likes

there should be super packages which promise the best ways to do a certain set of related things. A good super package can be almost like a Linux distribution, curated and providing support for interworking of its components.

This discourse site is the most active place of Julia community, but it lacks a dashboarding feature that shows the state of packages (such as those collected on Home · JuliaHub). If visitors of the most active Julia site can tell at a glance, at what rate packages are being updated, how many dependents they have, which packages are related or similar, where in the classification hierarchy (or hierarchies) the package falls under, i.e. things Juliahub might have tried to do but has not, then I believe there will be a natural consolidation over time, instead of attention being spent on unused, un-updated packages.

1 Like

Giving users tools to find and choose packages, use better the package’s API, organize them, etc., is fundamental. This can help avoid someone starting a new package and choosing another working package that requires a minor intervention to do whatever is needed, and in the same way, enable more effective use in general through documentation.

On this matter, I was on the idea of creating full-text search indexes for packages descriptions and docs,

For instance, several package managers support package searching and some categorization, but most will not support full-text search.

I kept working on the proof of concept. The new version uses web scrapping instead of GitHub API and includes support for packages hosted in Gitlab and possibly others.

https://github.com/sadit/Search-Julia-Packages/blob/main/search-pkgs.ipynb

It can search on package names (with errors) and by readme content (also topics and descriptions if they are in Github). It supports changing name and content weighting scores.

It also adds support for FT search on module documentation based on Base.Docs and uses the Lunr search index that Documenter produces (searching without installing).

https://github.com/sadit/Search-Julia-Packages/blob/main/search-docs.ipynb

It is pretty similar to packages but for documentation. It is in an early stage and requires a lot of work, but also just works.

In addition, visualization and a cluster are generated
https://github.com/sadit/Search-Julia-Packages/blob/main/clustering.ipynb
These are based on package README files. This can help get a view of similar packages as a whole.

A small web API using the Oxygen package was created to see who to use the API. Additionally, the package database is slow to load per query, so if it is used someday, it should be as a web server or something like that.

There are some sites to search for packages (juliahub included), but most of them need a browser or install other kinds of tools. The proposal is to create tools that work in the REPL or any Julia process.

So, the idea is to gather people interested in going in this direction and produce some tools that can help the community using the information already on the packages. I am open to discussing and collaborating on these ideas (negative ideas that make me stop this effort are also welcome).

7 Likes

I tried to use ChatGPT for a simple dataframe groupby. It generated what looked like complete code, so I was pretty excited. But it turns out to have totally misunderstood the API. The helpful part is it got me started looking at what functions to use, starting with a prompt was fun and mysterious, compared to starting by dredging through documentation.

Some of it is mysterious. I thought the code was incomplete when I saw some parameters followed by “…”. Oh yeah, the var arg list constructor, used to to convert a vector to a list of args.

I am don’t see why. If the license is FOSS, you can always fork out the project. Worst case scenario is having to pick a different name.

Sure, forks are uncommon in the Julia community, but I would love to see more of them. A lot of useful packages just sit there semi-abandoned (or “community maintained”) when an interested developer could just fork them and see what happens.

4 Likes