Fixing Package Fragmentation

How could the Julia community help developers avoid duplication of efforts in future packages?

In case some developers want to write a package similar to a current one, how could the Julia community help them to make this package as compatible as possible with current packages?

1 Like

That does sometimes happen upon registering new packages. Some folks do keep tabs on the new packages being registered and ask questions like these. Check out the new packages label on the General registry — and see, for example, ones where someone has talked about something being “different” (often appearing as “how is this different from X”?).

2 Likes

Could the Julia community foster a culture of explaining the usefulness of packages when someone plans to do it rather than when someone plans to register it?

For example, the forum could have a section or thread to discuss ideas for new packages so that developers can get guidance from more knowledgeable developers before starting the package.

4 Likes

Users already ask what is available to solve their particular problem here in this forum.
That also goes for package developers, but in that case they have often done a thorough search
for existing functionality at that point.

I personally think that there is no problem with package fragmentation: If you’d like to write your own, why don’t you? Just make sure that potential users understand the tradeoffs between your package and its competition. (I.e., write a README…)

5 Likes

In the middle of this discussion I also think that we are forgetting the personal factor of open source development. Two people who don’t get along may have different opinions and may choose to spend their time implementing opinionated versions of the same package. What’s wrong with that? In open source, you can’t tell people what to spend time on unless you pay them. So we have to keep this in mind. Also a GitHub org is not the answer to everything. Some orgs have a BDFL. Not everyone is comfortable making someone else a BDFL of their work. Moving a package to an org is a kind of irreversible decision to make someone else have more say over your code than you even if they choose not to exercise this power out of courtesy. So any “policy” we come up with should respect this personal side of open source development.

15 Likes

Wouldn’t it be better to split this thread into several: one for how to avoid further fragmentation with future packages and another for each possible merge of similar packages?

It is a complex subject with many aspects, and each should deserve its own thread.

1 Like

We don’t need a thread collecting packages that can be merged. If you identify two packages you think can be merged, raise that discussion.

8 Likes

there should be super packages which promise the best ways to do a certain set of related things. A good super package can be almost like a Linux distribution, curated and providing support for interworking of its components.

This discourse site is the most active place of Julia community, but it lacks a dashboarding feature that shows the state of packages (such as those collected on Home · JuliaHub). If visitors of the most active Julia site can tell at a glance, at what rate packages are being updated, how many dependents they have, which packages are related or similar, where in the classification hierarchy (or hierarchies) the package falls under, i.e. things Juliahub might have tried to do but has not, then I believe there will be a natural consolidation over time, instead of attention being spent on unused, un-updated packages.

1 Like

Giving users tools to find and choose packages, use better the package’s API, organize them, etc., is fundamental. This can help avoid someone starting a new package and choosing another working package that requires a minor intervention to do whatever is needed, and in the same way, enable more effective use in general through documentation.

On this matter, I was on the idea of creating full-text search indexes for packages descriptions and docs,

For instance, several package managers support package searching and some categorization, but most will not support full-text search.

I kept working on the proof of concept. The new version uses web scrapping instead of GitHub API and includes support for packages hosted in Gitlab and possibly others.

https://github.com/sadit/Search-Julia-Packages/blob/main/search-pkgs.ipynb

It can search on package names (with errors) and by readme content (also topics and descriptions if they are in Github). It supports changing name and content weighting scores.

It also adds support for FT search on module documentation based on Base.Docs and uses the Lunr search index that Documenter produces (searching without installing).

https://github.com/sadit/Search-Julia-Packages/blob/main/search-docs.ipynb

It is pretty similar to packages but for documentation. It is in an early stage and requires a lot of work, but also just works.

In addition, visualization and a cluster are generated
https://github.com/sadit/Search-Julia-Packages/blob/main/clustering.ipynb
These are based on package README files. This can help get a view of similar packages as a whole.

A small web API using the Oxygen package was created to see who to use the API. Additionally, the package database is slow to load per query, so if it is used someday, it should be as a web server or something like that.

There are some sites to search for packages (juliahub included), but most of them need a browser or install other kinds of tools. The proposal is to create tools that work in the REPL or any Julia process.

So, the idea is to gather people interested in going in this direction and produce some tools that can help the community using the information already on the packages. I am open to discussing and collaborating on these ideas (negative ideas that make me stop this effort are also welcome).

7 Likes

I tried to use ChatGPT for a simple dataframe groupby. It generated what looked like complete code, so I was pretty excited. But it turns out to have totally misunderstood the API. The helpful part is it got me started looking at what functions to use, starting with a prompt was fun and mysterious, compared to starting by dredging through documentation.

Some of it is mysterious. I thought the code was incomplete when I saw some parameters followed by “…”. Oh yeah, the var arg list constructor, used to to convert a vector to a list of args.

I am don’t see why. If the license is FOSS, you can always fork out the project. Worst case scenario is having to pick a different name.

Sure, forks are uncommon in the Julia community, but I would love to see more of them. A lot of useful packages just sit there semi-abandoned (or “community maintained”) when an interested developer could just fork them and see what happens.

4 Likes