How can we create a leaner ecosystem for Julia?

I’m the author of ARMAProcesses.jl (and BlockDiagonal.jl, which is pretty similar in the sense that it’s a one-file “package” that provides basic functionality that I wanted and couldn’t find in the ecosystem when I wanted it).

I like the idea of tags, but I think what would be more helpful actually are tags to indicate the scope or ambition of a package. I wrote code to do something that I needed at the time and published it in case somebody else found it useful. I don’t really think of ARMAProcesses.jl as a “time series package”, but I recognize that from its name alone one might reasonably expect that it is more feature-rich than it is.

I don’t use github and so I don’t think I can officially register any code that I release anyway, but some tag that conveys the idea that this is a very simple piece of code that relates to time series and should not be considered a time series package the way that TSAnalysis.jl is (for example) could be helpful. Tags like active-development or feature-expansion or feature-complete or something might at least be useful as filters for people who are looking to make a nontrivial commitment to choosing a package for some other project they’re working on or choosing a project to make contributions to.

3 Likes

This is an interesting thread. When I first joined this community, I realized that the best way to figure out what packages that I should use is to just ask the community :slight_smile: Usually, someone drops in immediately and tells me where to start :nerd_face: Let me provide some of my thoughts and opinions below.

Discoverability

Do I want to just read a curated document somewhere? Maybe, but only if the document is up to date and answer the exact same question I have. I have seen and tried to use R’s Task View before although I didn’t really like it that much. It only serves as a starting point but gives no guidance about which packages are more mature or more maintainable than others, which are some of the important factors that I consider.

To me, adding tags only serves as a beginning point as well. So, if we are going to provide more meta-information about packages, then let’s put more context around each package. I would propose:

  • Life cycle: experimental, growing, mature, deprecated
  • Maintenance: none, occasional, active
  • Test coverage: none, low, medium, high
  • Performance: not considered, basic tuning, highly optimized
  • Security: not considered, basic assessments, highly scrutinized
  • Production Readiness: none, few deployments, widely deployed
  • etc.

Package authors should be able to assess based on some community driven guideline.

Package Guides

I think another thing that can really help users is domain-specific guides to Julia packages.

Taking time series analysis as an example, it would be nice if there’s a web page that talks about the various packages, how they relate to each other (if any), and how they differ. Package authors can probably come together and contribute the the same guide.

How many topics? I don’t know. But this could be community/user-driven. When there’s a need (more questions asked) about a topic then one of the package authors can step up and initiate a new guide.

Redundancy

Personally, I don’t think redundancy is a bad thing. Nothing is going to be perfectly aligned unless we start consolidating packages together, which is going to be a very expensive proposition in terms of time and effort.

As a user, I can get feature X from package A and feature Y from package B, I would not care too much if they both implement several other common features.

As a package author, we may build something redundant only because we want to do so, but so that we can innovate and try to do the same thing better or address a problem more comprehensively.

I would think consolidation will just happen organically. Up until a point where packages become popular, package authors can just come to JuliaCon and hack something out together! :wink:

13 Likes

I think this is a great idea. Further, I don’t even thing we would need to particularly centralize it.
E.g. a personal blog post doing it is pretty solid (and I have a few like that).
It has advantage over anything centralized, because less need to coordinate, and its clear to everything that its just my opinion (people get antsy about being left off anything “official”), and that it is correct up to date of posting, so while it goes outu of date there is less assumption of it being updated.
Potentially, also in certain packages, or github orgs web-pages.

4 Likes

You are sharing the point of the view of the user right? It is a valid point view to consider. Understand that some points of view shared above come from the perspective of a package developer trying to unify work with other packages in the ecosystem.

Agree. There is a clear need for a central hub where people discuss a domain.

The sentence is perfectly qualified here: “As a user”. As a developer, however, I do care if there is a lot of overlap and time wasted on the same problems. Open source benefits the most when multiple eyes are taking care of a shared set of features. The concern of many in this thread is the fact that we have multiple overlapping packages with low test coverage, few use cases, broken design, etc.

Fair point, specially if there is an orthogonal design that is too difficult to accommodate in current packages. I don’t think this is the usual case though. Evidence shows that people introduce redundancy just for fun (that is fine), or because they don’t have time to learn the alternatives out there (that is not fine).

1 Like

tk3369, Great contribution, you were a little faster than me! :wink:

The topic is discussed in detail with all its advantages and disadvantages and sometimes even digresses into concrete packages. So far, so good! But I have the feeling that the actual topic takes a back seat and try to present my view on the issue:

For all packages listed on Julia documentation the following points should be considered:

User rating system regarding

  • Comprehensibility and completeness of the documentation
  • Stable use of the included functions
  • Redundancy to other packages (and therefore usefulness)
  • Helpful error messages

Developer evaluation system regarding

  • Code Structure
  • Comprehensibility of the commentary
  • Timeliness and maintenance cycles

Anyone can provide what they want via Github. The user therefore bears the risk. But if a package is listed on the Julia documentation page, it should meet certain requirements.

That’s my 2 cents for that and maybe we can discuss this in Lisbon over a beer or wine? :grinning: :beers: :wine_glass:

1 Like

I began this w/ 2 questions:
Q1: how can we improve Discoverability in Julia?
Q2: how can we improve Cohesion (reduce unnecessary redundancy/dependency) in Julia?
Here I will only discuss Q1.

I’ve studied all posts here related to improving Discoverability in Julia.
I like @Gunter_Faes’s idea.

  1. Many (but not all) R packages are on CRAN (perhaps similar to Julia docs).
  2. CRAN has Task Views which gives guidance about relevant packages for different domains
    (Bayesian, Differential Equations, Optimization etc).
    I’ve found Task Views helpful for TS & ML.
  3. The domains are maintained by volunteers w/ contact info, who regularly keep it up to date.
    Maintainers are not meant to endorse the “best” packages.
  4. Some CRAN packages are in multiple Task Views (BART). Some are not in any views, but should be (L0Learn). Some Task Views have links to R packages outside CRAN.

I believe it may help to include something like Task Views in https://pkg.julialang.org/docs/. Some users (@juliohm) have graciously offered to volunteer for certain domains.
To be clear, I don’t support copying Task Views from R, only the idea & best features from R & other places.

What do you think is the optimal way to do this?
E.g. I’d like a comment section at the bottom of each domain where users can post links to their own codes, desired features, etc
Also, following suggestions by @Gunter_Faes, @tk3369 & others we can list info for each package such as number of stars on GitHub, last time updated, etc

2 Likes

I think that the biggest problem is that everyone wants to build his own package, and starts working from scratch, implements half the features and then drops it. Then someone else decides that it wants to do its own package because the other one misses a feature that it needs. Start working from scratch again, then implements another set of features, but not all the features the other one had.
Then comes a third user that needs feature X and Y but A only has X and B only has Y. No problem i’ll just use both packages, but then you find out that A and B albeit similar are slightly off in their implementation and trying to make X work with Y brings so many problems that then you decide to start yet again from scratch and build a third package with now the features X and Y.

Probably one of the root cause of this problem is that this language attracts hordes of highly intelligent peoples, researchers, scientists, developers, students et al., all eager to exploit the wonders of julia but without drive to go the extra lenght. This creates a plethora of half packages, not necessarily bad or useless, but that create noise.

And this is the beauty of open source, but it is also its downside.

There are many packages that are well developed, but also many many more that are obsolete, abbandoned, not tested, not compatible with the existing ecosystem etc.

So maybe trying to discourage uploading unfinished packages to the general registry could be an option.
Or also, instead of having a big ‘general’ registry, separating it into two ‘stable’ and ‘development’ registry, where packages have to go through so kind of PR to go from dev to stable (similar to a pull request). This would merely be a check on quality of the package not the contents. Does it have documentations? Does it have enough test coverage? (@tk3369 already did an extensive list of good quality checks).

Having more quality and well documented code would mean an easier inter-cooperation between packages.

5 Likes

No, the general registry will stay as is. As mentioned by @Tamas_Papp, a new stricter registry can be created separately as an experiment:

2 Likes

Sorry, it sounds like that to me: The system is good and does not need to be discussed and improved! Is that really what you mean? :thinking:

No, that’s not what I mean. What I meant is that such a change would start by creating a new experimental registry, as mentioned by @Tamas_Papp in his post also. It would not start with changing the general registry, as @cshen mentioned.

6 Likes

I don’t think that anyone considers the ecosystem for any language at any point ideal. It is always possible to improve something, usually a lot. This applies to Julia, too.

But while proposals for a more centralized, gatekeeper-style registry could help, there are a lot of details that need to be worked out, and I think that people are just arguing that this experiment should not take place in the General registry.

6 Likes

Slow down please. I don’t think we’re ready to discuss changing the registry.
I think the best first step is to create a Julia platform (like Task Views) where we can see all existing packages/other code for each domain neatly organized by volunteers.

@Tamas_Papp & @fipelle check out Time Series maintained by Rob Hyndman. I found this very helpful & am thankful he did this.

If we did this for Julia I bet we’d find a few things:

  1. Some domains in Julia (DiffEq & Optimization) are very lean & from the beginning borrowed the best features from the rest of the world.
    Compare them to R’s (DiffEq & Optimization) & note the best DE package in R calls Julia (thanks @ChrisRackauckas).
  2. Other domains like ML & TS are still far from lean.

This platform will help users (new & potential) find the things they are looking for.
This platform will help developers & scouts like @Datseris write better packages.

3 Likes

Reading some of the posts above I thought this was the main proposal: a concentrated registry managed by domain experts and volunteers. For that, I am not particularly enthusiastic.

I like this better - even though I am not sure how much time I could dedicate to it. However, I think it would be best to have a dedicated wiki page. The latter could be more approachable by new independent developers.

4 Likes

I haven’t been following this discussion more than vaguely, but I would like to put in that GitHub organizations are astonishingly effective at helping to coalesce a community around a set of packages. It doesn’t necessarily bring more total contributors, but it makes it much easier to discover related packages and it means that the maintainers of one package by default have the authority to maintain other packages in the organization.

5 Likes

Not sure if the people in this discussion are also aware of this resource maintained by @svaksha

3 Likes

I just discovered CRAN Task View, and I think that is an excellent idea. Would be great if we had something like that on pkg.julialang.org itself. Probably more useful for users than to coordinate development, but that seems also useful :slight_smile:

3 Likes

My two cents based on anecdotal evidence:

A few years ago we had a similar discussion about astrodynamics packages in Python. We were three people with three coexisting packages (I would not say competing because none of us had a significant number of users). After meeting at a conference we got together and started a new package which was supposed to take the best ideas from all the others and reduce fragmentation.
In the end, we got bogged down in endless design discussions and bikeshedding and accomplished NOTHING :man_shrugging:
In hindsight, I believe that we tried to collapse the design space to early when we did not have real experience with actual code but only clashing design sensibilities. One of us lost interest, I was turned to the dark side (Julia), and my buddy Juan steadly continued with his original package poliastro which is now quite successful.
TL;DR: I believe that an evolutionary approach is best. May the fittest package survive and become a standard somewhere along the road.

Because of this I am not bothered by the fact that we are taking things slow at JuliaSpace. @Ronis_BR and I are approaching software design in different ways and that is fine even though it means that there is some duplication of effort in the meantime (he is also very productive and I am not :joy:). I am certain that this will lead to much better code in the long run.

16 Likes

My 2c re discovery: It is really helpful when the readme of one package links to other related packages, ideally with a sentence explaining how they differ. (And unrelated packages, with confusable names! And python/matlab packages which people may search for.) That’s much easier than trying to maintain this centrally somewhere, and argue about what the categories should be, etc.

7 Likes

Based on the discussion around GitHub organizations, I’ve migrated most of my packages to a new organization JuliaEarth for applied statistics in Earth sciences. Even though the statistical theories there are not restricted to these sciences, they were motivated by practical problems in hydrogeology, mining, environmental sciences, etc.

5 Likes

I just can’t agree more with @helgee!

Many, many people that are starting to learn Julia decide to code their own package based on their area of expertise, even though a package already exists. I remember when I started to use Julia back in 2012, being a MATLAB programmer, the Julia-way of coding seems like another language to me. My eyes used to hurt by looking at those ::Int :slight_smile: It took time to get used to. I decided to learn Julia by coding algorithms related to satellite applications that eventually became SatelliteToolbox.jl. By that time, if we had a similar package in the ecosystem, I would have 0 of the expertise to contribute to, because I was not able to even fully understand the existing code!

Today, the SatelliteToolbox.jl is being developed inside the GitHub organization JuliaSpace that @helgee mentioned and has been successfully used on many occasions at the Brazilian National Institute for Space Research (INPE).

My point is: let the packages come! Yes, there will be a lot of duplication and there will be a lot of dead projects. However, some of those packages will have very nice functions that will last. Some will be top notch like the DifferentialEquations project. The only important thing is to have a way to clean Registry of dead projects (which is already done, I think). In fact, even SatelliteToolbox.jl has duplicated algorithm w.r.t. @helgee’s excellent work. This is known and eventually, we will merge things when everything is settled down.

Having the possibility to register a package makes the new programmer knows that Julia is a great community and they are welcome! This is why I just love the new approval system implemented in GitHub for new packages and for updates.

9 Likes