How can we create a leaner ecosystem for Julia?

@xiaodai This comment is unhelpful and rude. It could literally be the answer to any question about Julia. It is my gentle request to modify it to be positive and helpful.

4 Likes

I feel that you need a core of people driving something. I don’t think one is the right number, since it is very easy to get demotivated or go down the wrong path. Usually, 2-3 people working together in my opinion produce high quality packages. But yes, you certainly can’t get high quality packages starting with a crowd.

At the same time, it is not easy to central plan this. We do not know which particular person or set of people is going to get it right, stick with it, and build the community. Thus sometimes multiple efforts are valuable in exploring the design space.

In this particular case, I can imagine that someone trying and writing a blog post about the state of various time series packages might be a good start.

-viral

17 Likes

I’m not super experienced as a developer. But, at work when this sort of stuff started happening someone voiced concerns like this, and then someone with some skills/leadership qualities started looking at the “mess” and seeing patterns that could unify the efforts.

I think with some of these more jumbled spaces we could start to see this.

This is the battle of scaling. I have to admit though, @ChrisRackauckas is right. It’s usually one person fearlessly/obsessively cutting a path. It does take a small team, or single hardheaded person who thinks they can handle it themselves to make rapid and stable progress. There’s downsides to this though, sometimes code bases become illegible, or tainted by confusing design patterns, then no one can contribute. So solving the problem in 2019 becomes a dead end in 2021 or more optimistically 2025, when new things arrive.

I’m planning an experiment with a different type of open source workflow… But, if it falls down to a solodev effort that’s okay too :). I think Julia is the right modality for it, but we’ll see.

I also agree - it doesn’t take money. It takes necessity, interest, and passion. Julia allows for extreme modularity, we can leverage this by suggesting successful design practices. Thats why julia has such an amazing backbone, those things alone.

A good start is making documentation available, and the efforts of others as available as humanly possible. The rest will happen naturally over time.

3 Likes

Thank you for referencing TSAnalysis.jl.

I think you highlighted some interesting points. My view is the following.

For what concerns the time-series field the number of registered packages compatible with Julia 1 and with unit testing is rather small (a subset of the above).

I am toward the end of my doctorate (one year and a half left :crossed_fingers:) and my research is mostly on time series. TSAnalysis.jl is still preliminary, but I will consistently add new features (see this link for more details). I plan to cooperate with other developers (co-authors and externals) and I am trying to avoid overlaps whenever possible. However I also want to have control on the most primitive part of my package. It might be a personal limitation, but I aim to:

  1. have enough security to be confident in using TSAnalysis.jl as a basis to write academic papers;
  2. define a solid and modular layout that allows for regular updates.

I suspect that different mantainers might have similar perspectives. Of course, this might create overlaps. That said, users generally tend to concentrate around the most efficient and friendly packages. Git often pushes the most used.

4 Likes

It is understood that users would prefer this. But, at the same time, it is very likely that this would happen gradually and organically, and there is very little we can do to speed up the process, other than contributing.

The primary reason for this may be that Julia is a very new language with an unprecedented combination of features (notably parametric types, multiple dispatch, and AOT compilation). Providing some functionality with a performant and well-designed interface is usually more involved than simply porting equivalent libraries from other languages.

Consequently, a lot of packages are experimental, exploring the design space. They may turn into a polished library, get merged when the time is ripe, or abandoned when the author(s) lose interest.

Navigating this situation is not easy. It is not uncommon that one has to look at multiple packages before finding an ideal solution. This is how I usually do it:

Search these forums, Github, and Gitlab, possibly with multiple combinations of keywords, then make a shortlist of 1-3 packages, and evaluate them based on

  1. recent activity (especially for issues and pull requests: are they authors responsive?)

  2. documentation quality (ideally, there is some documentation, or at least docstrings)

  3. look at the source code and unit tests: if they are organized, tidy, and well-documented, the package is more likely be something the authors intend to maintain. Also, well-maintained code makes it easier to contribute, or potentially continue working on the package if the original authors don’t have time.

20 Likes

Thank you for your comments! I’ll discuss the first (easier) question:
Q1: if I’m using/developing Julia, what is an easy way to find all code in my domain?

1 From (here) & (here) it is clear that Julia users are having trouble finding code in their domain.
I wouldn’t have found many of the 20+ links I posted searching GitHub or Google. I knew about Financial Risk Forecasting, QuantEcon, Creel, and Soderlind before I heard of Julia.

2 @Roger-luo suggests standard keywords in the Pkg registry like in pip.

3 May I suggest a pinned post @ the top of each domain in Discourse?
For example, in the Statistics category description:
(3.A) we can post links to time-series code (including the 20+ links above), and let users comment which packages are missing. When someone has an announcement about a new package, they can add their package to the list as well.
(3.B) users can post desired functionalities for that domain

Similarly for Data, Finance/Economics, Astro/Space etc.

I’m personally interested in all ML packages for Julia & it’s harder to find things than you’d expect.
This resource would be great for developers working on ML interfaces such as MLJ.

4 Individuals who have attempted to track Julia packages in various domains, have incomplete lists & rarely update those lists (here, here, here).
Hence, this resource is more likely to be maintained in an official location (such as Discourse).
Perhaps put someone in charge of overseeing each list & they can pass the baton to someone else after a year?

1 Like

Why not use pkg.julialang.org?

This statement makes a fundamental mistake in understanding open source software.

The reason why so many packages exist is not because of a meme like “need more packages”. It is because somebody wanted to do some programming and create something for whatever reason, and they decided to share the result of their work online for free. That person who made that piece of code does not have the responsibility of organizing it into a bigger framework.

The way I see it, there are people who make things available for free, and either you like that code they made or you don’t. By saying that you’d prefer a leaner ecosystem, you are implying that you’d rather not have people share their code for free. It is not the responsibility of people making code available for free to make it available in such a way that makes you happy.

It would certainly be nice to make a more unified and coherent package ecosystem, but it is not anyone’s responsibility to do that unless there is some kind of incentive to motivate it for those people.

There wasn’t necessarily any incentive for the developers of those packages to organize their efforts. They may have been able to accomplish their own goals without satisfying your goal. One way to overcome this would be either do the organizational work yourself or to hire the developers to do it.

People working for free should not have any expected responsibility for package cohesion, unless they have some sort of incentive or motivation based on their own work or interest or funding.

Of course, it would be good to encourage a more cohesive package ecosystem, but one cannot create an expectation that random strangers on the internet (who are working for free) do this for you.

5 Likes

I feel like while this is technically true, this observation doesn’t fundermentally change as much as might first appear.

And the reason it doesn’t change much is as you say in:

People making open source packages do so from some motivation.
And I believe: that in most (but not all) cases that same motivation also favors cohesion.
As a second statement to that: as a rule the majority of people who do not have an interest in promoting cohesion are not reading this thread.

  • I maintain DataStructures because I feel its good for the world, and similarly even today I was working though making changes to help make it more coherant with other packages (in todays case the standard library). Because I think that is good for the world.
  • I make PRs to StatsBase and other statistical packages because when they are out of agreement it breaks my things. I also make PR’s to the same packages because they are broken and them being broken breaks my things.
  • I have releases tools that follow an API, and made PRs to help follow that API because I want to use that API in many places.
  • I generally release open source packages, and discuss broader picture ecosystem improvements because I want this community to grow.

So my motivations for open sourcing and for promoting cohension comes from the same place.
and I don’t think I am atypical in this.

There are exceptions the this.
E.g. some people have made it abundantly clear that open sourcing their code is as far as they are willing to go without payment, and will refuse all requests to make it work well with other packages without payment.
Which is reasonable and their right.
Similarly, some people have made it clear that supporting code as part of a cohesion effort is something they are not willing to take into their package (and thus maintance burden) until it is proven and widely adopted. Again reasonable and their right.
But I think this is not the overall standard.

In general general saying that you can’t ask volunteers to do something, because they are working for free, is a nonstarter of an arguement.
I personally, and I believe others also, would be happy to take issues on my packages if someone said there was a way I could change them to allow for a more coherent package ecosystem.
And I absolutely have put time into both creating packages and working out logistics and such to allow for coherance.
Its hard, and a lot of it is on going, but hard things are worth doing sometimes.

We definately now do have some great meta-packages, like JuMP, MLJ, Plots.jl etc.
And to get there we have to try.

17 Likes

I am in agreement and would also be interested in a more cohesive environment and I would be open to look at those issues also… provided it aligns with my interests and availability.

The purpose of my post was to point out the flaw in the original statement, and why in general it is a flawed statement, even if there are people who are willing to work on these things for free at times.

@ChrisRackauckas I wouldn’t be able to find many of the links w/ TS code there.

1 I’m writing from the perspective of someone who wants to see Julia thrive. Else, I wouldn’t spend time collecting TS packages & posting this note (which I happily do for free).

2 I’ve noticed community culture is important & contagious. People on this forum have been very helpful to me & consequently I’m motivated to help others if I can.

3 From my links to code above we see there is a huge amount of redundancy for time-series.
From my links to posts on Discourse we see at least some of this redundancy is from a lack of awareness about other packages.

4 As a user, I’d prefer one TS package w/ many of the functions from the other 20+ packages.
That’s my preference, I don’t expect anything from anyone.

5 I’m writing a program to train an ML model.
As a developer, I want to write a program that is easy for others to use & nicely fits into interfaces. I will consider adding my model to larger existing package, b/c I believe it’s better to have more algorithms in fewer packages.
There are many judgement calls for me to make & I wish there was more centralized organization in the Julia community.
I don’t care if we all drive on the left side or the right side, as long as we drive on the same side.

@ChrisRackauckas hit the nail on the head when he wrote:

It would be nice if other domains, such as ML and TS, were similarly organized.

4 Likes

I think the one-size-fits-all package idea is generally sub-par. There is always a problem of where you draw the line on what belongs in the package and what does not. In the time series context, what about state space models, bayesian analysis, bootstrap resampling, etc… All very frequently used alongside time series methods, yet also frequently used outside of it. I like that these are separate pieces.

A lot of the packages listed in the OP are probably not created with the intent of becoming “THE” time series analysis package in Julia. So the fact that they are perhaps hard to find is not an issue. I don’t mind if some guy’s replication files for his thesis don’t show up on page 1 of my search results for “julia time series”.

I would say one issue is that we do not have an easy way to signal the “quality” of the package. @Tamas_Papp has the right idea but as he says it is not easy. Maybe it could be made easier?

3 Likes

I am also a fan of modular, nicely interoperating packages that do one thing well (forming a flexible “toolbox”), as opposed to a single umbrella library (“suite” or “toolkit”) that tries to do everything. People coming from other languages where such packages are the norm often find the Julia ecosystem too confusingly diverse, but it works rather well. Cf

9 Likes

Would it make sense to use the global registry to appoint “coordinators” per big domain area? (broadly corresponding to the subcategories on this forum). People submitting new packages would be expected to select keywords (possibly even in the manifest?) The coordinators would be people who agree to look at the package registrations in their area, to ensure things run smoothly, that conflicts between packages get resolved amicably, to orient new contributors, to make the usual “hey, this package looks great! Can you clarify what it adds wrt X, Y and Z? Perhaps this feature could be merged into that package?” comments, etc. These people should not necessarily be core contributors to key packages, but rather people who have some time to devote to this and know the community. To sweeten the deal these people could get free participation to julia con or something.

4 Likes

I don’t understand what kind of conflicts you are talking about here, can you please clarify?

I am not sure we should demand this as part of registration. People should feel free to experiment with new approaches without worrying about a justification.

Of course, if people feel like they have the time and resources to start some kind of curated registry, they should go ahead, it is pretty easy these days. Personally I think that what you are proposing is a lot of thankless work and it may not be possible to incentivize it easily, especially for people who have the necessary skills.

I think that we should just improve discoverability instead. FWIW, I think that asking on this forum is generally the best approach for selecting a package.

5 Likes

Conflicts in terms of different method names for similar things, that usually get resolved with DomainBase type packages.

Of course we should not demand it, but we should certainly suggest it. It’s very often the case that people are not aware of what others are working on (perhaps because it’s wip, perhaps because it’s niche, etc.) and such comments are welcomed

RE thankless work : it’s work some people are already doing voluntarily right now. Being “blessed” in some organizational way provides a sense of ownership that incentives this further

2 Likes

For example, I’d like my Grassmann.jl package to replace various obsolete functionality from the GeometryTypes package, such as the Point and Simplex types. However, I know that if I started proposing such things, it probably wouldn’t gain much approval from other developers because it would require a bunch of work to change things. Overall, it would help with generalizing much functionality in the Julia ecosystem to transition to Grassmann, which will provide differential geometric algebra functionality. It would make me very sad though if I am expected to only live off of my $17 dollars a month, while I am laboriously making contributions that would enable the work of lots of people who are paid full salaries.

This is why I am going to continue working solo and making my own ecosystem. However, I would be open to start the discussion for replacing GeometryTypes with the more versatile Grassmann algebra. There is much else that could benefit from geometric algebra, which is a unification of various ideas. However, I will probably just continue working on my designs alone and build up my own personal ecosystem around it, since working alone allows me to think of much more advanced concepts for generalization without needing to explain and justify myself to other people (for free).

It would be nice to have my programming work integrated into the Julia ecosystem, but I have hesitated from proposing such things because it is not worth the effort to explain my ideas to people. When I am working for free and living off of $17/month, it is better for me to just use my time to design my own ecosystem instead of getting into huge discussions about design with people who are paid full salaries but don’t understand my design choices.

GitHub organizations (or GitLab groups, Bitbucket teams, etc.) are a nice tool to attract individual projects, and drive their development under a common umbrella to reduce redundancies and improve consistency between them.

I want to share my positive experience in this regard: I started to develop RecurrenceAnalysis as one of those “hobby projects”. Time after I was invited to join the JuliaDynamics organization and move my package there. This started a fruitful collaboration that helped to improve the package, providing it with much better documentation — consistent with all the other packages of the organisation, a narrower focus — moving some features to other packages (e.g. DelayEmbeddings), and better performance.

Regarding the question of “more packages” vs. “better packages”, I think that JuliaDynamics is again an example of this not being a real issue: that organization has a set of packages dedicated to particular tasks, and also the DynamicalSystems library that installs them together.

As others have already said, this does not happen magically; it is necessary one person (or a few) that leads and coordinates the collaboration — in this case it is @Datseris, whose excellent work I want to praise. In my opinion this is a success story, that may be used as an example of how to move towards such a leaner ecosystem. Perhaps George might share his experience, and tell what resources he has needed to achieve this, good practices, etc.

10 Likes

Thanks a lot for pointing this out @heliosdrm . I’ll try to add my side, although there isn’t much more to add other than what you said. I’ll try to outline how the JuliaDynamics story became a successful story. In principle I think I can summarize my though process in trying to make JuliaDynamics a “successful” org (even though this is totally subjective), in the following points:

  1. As you pointed out, someone (or a group of people) should have a drive (and willfulness to spend some extra time) to bring a high-quality organization for a specific “genre” into Julia. This group will help the org by coordinating things. My “genre” was dynamical systems and nonlinear dynamics and for JuliaDynamics I do most of the coordination.
  2. Documentations are extremely important and they attract users as well as developers to join the org. Everyone involved, should be spending time building quality documentation: for the community, a high-quality documented functionality is better than just having one more function implemented.
  3. Avoid redundancy and duplication: as pointed out in the very first post of this thread, there are dozens of packages that do similar things. To this, a community should collectively say “NO”. There should as little as possible packages, the best, that do all (or most) things the best way and just work flawlessly with each other (this is exactly what happens in JuliaDynamics e.g.). As @heliosdrm pointed out, this was yet one more benefit of joining the org: we compared the same features we had in two different packages, and kept the best implementations. I am very aware of how “wrong” this statement sounds to many here as the claim is that “this is not how open source works, everyone makes what they want”, etc. Well, in my eyes the best thing for a language is to have the MINIMUM AMOUNT of packages that are the BEST in what they do, while REUSING as much code as possible. Its up the community of the genre under discussion to think about what they want…
  4. The members of this org should actively “scout” relevant packages, and invite people to join. Having things together in an org is better for the community. And when you do find and invite some package, be good about it: help them join, help them get better documentation, help them get better code, help them get hooked up into the ecosystem. I think this was a really successful thing with RecurrenceAnalysis.jl and Agents.jl where I believe both packages saw both real code improvements, but also increase in popularity, after they joined. But @heliosdrm and @Ali_Vahdati should say for themselves, they were the original owners.
  5. An important thing that only recently I’ve come to realize, that is necessary for packages to join an organization and improve themselves and the organization is the following: developers should care more about getting a better ecosystem for Julia, than having their “name as the name of the owner of a package”. (You might think that this point is “ridiculous”, yet I claim that it is one of the driving reasons that there are 2 dozen packages that do the same thing)

Ultimately, and this has been discussed in this post many times, this takes not only time, but also willingness to collaborate and also drop parts of your “status” regarding a repo. It is up to the community at hand whether there is someone that is willing to spend some extra time or not.

As far as my life goes, I can say it is totally worth it to spend extra time trying to coordinate JuliaDynamics: everyone involved, including myself, just becomes happier because things improve collectively and collaboratively, and that is just cool!

17 Likes