How can we create a leaner ecosystem for Julia?

To add to the discussion on GitHub organizations, evidence indicates that that they do not always improve discoverability of packages. I personally never search packages based on organizations. I do Google search or use https://pkg.julialang.org search. However, these organizations may boost the credibility of packages found via other methods, and thus create a collaboration hub. They also may increase the willingness of others to contribute with code.

1 Like

And another link on how to organize contributions via the concept of contributorship, which is becoming popular in journals: https://www.mdpi.com/2304-6775/7/3/48/htm

Github organizations cannot be followed on github.

It often bothers me that I can’t follow a github organization to see when new packages are created.

16 Likes

I think we really need to clear what is the problem you are trying to solve.

Open source needs to be free. If you want to have a centralised community you need to have mandatory requirements and binding reviews by domain experts (the selection process would deserve a separate and long discussion).

In my view, we are choosing between two different scenarios:

  • current conditions: open to organised and independent development (via orgs and individuals). This could result more easily in duplicate packages. However, user have more options (Individually, each package does not contribute much, but they contribute collectively).

  • centralised community: open to organised development, mostly closed to independent packages. Less duplicate packages. Users have less options.

In both cases users need to be able to filter the internet and find the right package for their needs. However, in the second case we will probably see the creation of some sort of alternative register for independent developers (or in the creation of a broad range of non-registered packages). This could make this whole process more complicated.

Furthermore, in the case of an organised community developers will enter in typical bounded rationality problem, since they will have to decide what goes into a large package, trying to interpret users desire.

If you believe that we have too many packages doing the same thing because people want to have ownership on their code, then I do not see how a notification system could do it - since it can be simply ignored.

If the discussion is about incentives and the use of GitHub orgs then it might be best to split the post or change the title.

4 Likes

Fortunately, I don’t think that tightening review/acceptance requirements for the General registry is considered. My understanding is that any kind of review/curation/quality control is encouraged to take place in other registries, maintained by organizations etc.

If someone believes that a centralized process for this can/should be organized, they should just go ahead and create a registry for that, and then we can see how it works.

2 Likes

I agree with what’s said here. I’d like to add an extra benefit of joining an organization for developers: you get other people to think and comment about problems you care and have thought about. This can be very rewarding and instructive. I, at least, learned a lot along the way.

2 Likes

New redundant packages appear because it is usually hard to merge pull requests into current repositories. Also, people don’t reuse code because of the reasons that I will state in the following.

My personal experience offers that there are a couple of patterns that repeat regarding this topic

  • When a package’s development is stopped or not worth it:
    E.g.: PackageCompiler and PackageCompilerX: the development of the PackageCompiler is very slow or stopped, and there is that JuliaCon youtube on how to rewrite PackageCompiler. That is why PackageCompilerX has emerged.
    Many other examples fit under this.

  • When developers of a package don’t want extra features:
    E.g.: Julia-Client and Juno-Plus: I submitted a PR to Julia-Client, however developers thought that Juno-Plus has too many extra features that should not be inside Julia-Client, and so the 2nd one got emerged.

  • When a package’s developer wants to minimize dependencies.
    E.g.: PrettyTables and AcuteML: despite all of the benefits of using another advanced package, the developer thought using AcuteML as the HTML backend will increase the overall loading time of the package.

  • It is hard to become a member of an organization:
    Requires contacting the main owner.

  • Lack of package tagging:
    Many mentioned this already in this thread.

  • Lack of documentation:
    Static Linter, Language Server are examples of packages with zero documentation. That is why no one knows how to use them anywhere else other than in VSCode.

  • Not allowing overriding a method:
    Overriding a method in Julia will result in an incremental precompilation warning. If someone wants to use some code but wants to override a method, they should either write the code from scratch or make a fork of the original package and edit the function there. However forks are not supported in Project.toml, and Manifest.toml files aren’t as nice as Project.tomls.

  • When people want to make a portfolio:
    In this case, people usually tend to write everything from scratch. And those, who have written a package on their own, tend to disagree with merging other’s people PRs.

A solution to some of these problems is allowing to have multiple packages in a repository:


or allowing to load a subset of a package

1 Like

I’m the author of ARMAProcesses.jl (and BlockDiagonal.jl, which is pretty similar in the sense that it’s a one-file “package” that provides basic functionality that I wanted and couldn’t find in the ecosystem when I wanted it).

I like the idea of tags, but I think what would be more helpful actually are tags to indicate the scope or ambition of a package. I wrote code to do something that I needed at the time and published it in case somebody else found it useful. I don’t really think of ARMAProcesses.jl as a “time series package”, but I recognize that from its name alone one might reasonably expect that it is more feature-rich than it is.

I don’t use github and so I don’t think I can officially register any code that I release anyway, but some tag that conveys the idea that this is a very simple piece of code that relates to time series and should not be considered a time series package the way that TSAnalysis.jl is (for example) could be helpful. Tags like active-development or feature-expansion or feature-complete or something might at least be useful as filters for people who are looking to make a nontrivial commitment to choosing a package for some other project they’re working on or choosing a project to make contributions to.

3 Likes

This is an interesting thread. When I first joined this community, I realized that the best way to figure out what packages that I should use is to just ask the community :slight_smile: Usually, someone drops in immediately and tells me where to start :nerd_face: Let me provide some of my thoughts and opinions below.

Discoverability

Do I want to just read a curated document somewhere? Maybe, but only if the document is up to date and answer the exact same question I have. I have seen and tried to use R’s Task View before although I didn’t really like it that much. It only serves as a starting point but gives no guidance about which packages are more mature or more maintainable than others, which are some of the important factors that I consider.

To me, adding tags only serves as a beginning point as well. So, if we are going to provide more meta-information about packages, then let’s put more context around each package. I would propose:

  • Life cycle: experimental, growing, mature, deprecated
  • Maintenance: none, occasional, active
  • Test coverage: none, low, medium, high
  • Performance: not considered, basic tuning, highly optimized
  • Security: not considered, basic assessments, highly scrutinized
  • Production Readiness: none, few deployments, widely deployed
  • etc.

Package authors should be able to assess based on some community driven guideline.

Package Guides

I think another thing that can really help users is domain-specific guides to Julia packages.

Taking time series analysis as an example, it would be nice if there’s a web page that talks about the various packages, how they relate to each other (if any), and how they differ. Package authors can probably come together and contribute the the same guide.

How many topics? I don’t know. But this could be community/user-driven. When there’s a need (more questions asked) about a topic then one of the package authors can step up and initiate a new guide.

Redundancy

Personally, I don’t think redundancy is a bad thing. Nothing is going to be perfectly aligned unless we start consolidating packages together, which is going to be a very expensive proposition in terms of time and effort.

As a user, I can get feature X from package A and feature Y from package B, I would not care too much if they both implement several other common features.

As a package author, we may build something redundant only because we want to do so, but so that we can innovate and try to do the same thing better or address a problem more comprehensively.

I would think consolidation will just happen organically. Up until a point where packages become popular, package authors can just come to JuliaCon and hack something out together! :wink:

10 Likes

I think this is a great idea. Further, I don’t even thing we would need to particularly centralize it.
E.g. a personal blog post doing it is pretty solid (and I have a few like that).
It has advantage over anything centralized, because less need to coordinate, and its clear to everything that its just my opinion (people get antsy about being left off anything “official”), and that it is correct up to date of posting, so while it goes outu of date there is less assumption of it being updated.
Potentially, also in certain packages, or github orgs web-pages.

4 Likes

You are sharing the point of the view of the user right? It is a valid point view to consider. Understand that some points of view shared above come from the perspective of a package developer trying to unify work with other packages in the ecosystem.

Agree. There is a clear need for a central hub where people discuss a domain.

The sentence is perfectly qualified here: “As a user”. As a developer, however, I do care if there is a lot of overlap and time wasted on the same problems. Open source benefits the most when multiple eyes are taking care of a shared set of features. The concern of many in this thread is the fact that we have multiple overlapping packages with low test coverage, few use cases, broken design, etc.

Fair point, specially if there is an orthogonal design that is too difficult to accommodate in current packages. I don’t think this is the usual case though. Evidence shows that people introduce redundancy just for fun (that is fine), or because they don’t have time to learn the alternatives out there (that is not fine).

1 Like

tk3369, Great contribution, you were a little faster than me! :wink:

The topic is discussed in detail with all its advantages and disadvantages and sometimes even digresses into concrete packages. So far, so good! But I have the feeling that the actual topic takes a back seat and try to present my view on the issue:

For all packages listed on Julia documentation the following points should be considered:

User rating system regarding

  • Comprehensibility and completeness of the documentation
  • Stable use of the included functions
  • Redundancy to other packages (and therefore usefulness)
  • Helpful error messages

Developer evaluation system regarding

  • Code Structure
  • Comprehensibility of the commentary
  • Timeliness and maintenance cycles

Anyone can provide what they want via Github. The user therefore bears the risk. But if a package is listed on the Julia documentation page, it should meet certain requirements.

That’s my 2 cents for that and maybe we can discuss this in Lisbon over a beer or wine? :grinning: :beers: :wine_glass:

1 Like

I began this w/ 2 questions:
Q1: how can we improve Discoverability in Julia?
Q2: how can we improve Cohesion (reduce unnecessary redundancy/dependency) in Julia?
Here I will only discuss Q1.

I’ve studied all posts here related to improving Discoverability in Julia.
I like @Gunter_Faes’s idea.

  1. Many (but not all) R packages are on CRAN (perhaps similar to Julia docs).
  2. CRAN has Task Views which gives guidance about relevant packages for different domains
    (Bayesian, Differential Equations, Optimization etc).
    I’ve found Task Views helpful for TS & ML.
  3. The domains are maintained by volunteers w/ contact info, who regularly keep it up to date.
    Maintainers are not meant to endorse the “best” packages.
  4. Some CRAN packages are in multiple Task Views (BART). Some are not in any views, but should be (L0Learn). Some Task Views have links to R packages outside CRAN.

I believe it may help to include something like Task Views in https://pkg.julialang.org/docs/. Some users (@juliohm) have graciously offered to volunteer for certain domains.
To be clear, I don’t support copying Task Views from R, only the idea & best features from R & other places.

What do you think is the optimal way to do this?
E.g. I’d like a comment section at the bottom of each domain where users can post links to their own codes, desired features, etc
Also, following suggestions by @Gunter_Faes, @tk3369 & others we can list info for each package such as number of stars on GitHub, last time updated, etc

2 Likes

I think that the biggest problem is that everyone wants to build his own package, and starts working from scratch, implements half the features and then drops it. Then someone else decides that it wants to do its own package because the other one misses a feature that it needs. Start working from scratch again, then implements another set of features, but not all the features the other one had.
Then comes a third user that needs feature X and Y but A only has X and B only has Y. No problem i’ll just use both packages, but then you find out that A and B albeit similar are slightly off in their implementation and trying to make X work with Y brings so many problems that then you decide to start yet again from scratch and build a third package with now the features X and Y.

Probably one of the root cause of this problem is that this language attracts hordes of highly intelligent peoples, researchers, scientists, developers, students et al., all eager to exploit the wonders of julia but without drive to go the extra lenght. This creates a plethora of half packages, not necessarily bad or useless, but that create noise.

And this is the beauty of open source, but it is also its downside.

There are many packages that are well developed, but also many many more that are obsolete, abbandoned, not tested, not compatible with the existing ecosystem etc.

So maybe trying to discourage uploading unfinished packages to the general registry could be an option.
Or also, instead of having a big ‘general’ registry, separating it into two ‘stable’ and ‘development’ registry, where packages have to go through so kind of PR to go from dev to stable (similar to a pull request). This would merely be a check on quality of the package not the contents. Does it have documentations? Does it have enough test coverage? (@tk3369 already did an extensive list of good quality checks).

Having more quality and well documented code would mean an easier inter-cooperation between packages.

5 Likes

No, the general registry will stay as is. As mentioned by @Tamas_Papp, a new stricter registry can be created separately as an experiment:

2 Likes

Sorry, it sounds like that to me: The system is good and does not need to be discussed and improved! Is that really what you mean? :thinking:

No, that’s not what I mean. What I meant is that such a change would start by creating a new experimental registry, as mentioned by @Tamas_Papp in his post also. It would not start with changing the general registry, as @cshen mentioned.

6 Likes

I don’t think that anyone considers the ecosystem for any language at any point ideal. It is always possible to improve something, usually a lot. This applies to Julia, too.

But while proposals for a more centralized, gatekeeper-style registry could help, there are a lot of details that need to be worked out, and I think that people are just arguing that this experiment should not take place in the General registry.

5 Likes

Slow down please. I don’t think we’re ready to discuss changing the registry.
I think the best first step is to create a Julia platform (like Task Views) where we can see all existing packages/other code for each domain neatly organized by volunteers.

@Tamas_Papp & @fipelle check out Time Series maintained by Rob Hyndman. I found this very helpful & am thankful he did this.

If we did this for Julia I bet we’d find a few things:

  1. Some domains in Julia (DiffEq & Optimization) are very lean & from the beginning borrowed the best features from the rest of the world.
    Compare them to R’s (DiffEq & Optimization) & note the best DE package in R calls Julia (thanks @ChrisRackauckas).
  2. Other domains like ML & TS are still far from lean.

This platform will help users (new & potential) find the things they are looking for.
This platform will help developers & scouts like @Datseris write better packages.

3 Likes

Reading some of the posts above I thought this was the main proposal: a concentrated registry managed by domain experts and volunteers. For that, I am not particularly enthusiastic.

I like this better - even though I am not sure how much time I could dedicate to it. However, I think it would be best to have a dedicated wiki page. The latter could be more approachable by new independent developers.

4 Likes