How can we create a leaner ecosystem for Julia?

I’m not really worried about that kind of thing happening to me, but it needs to be acknowledged that there is some dilution of attribution in moving to an org that could get complicated when a project has been funded by a private organization – who needs to get modelling contracts in future. Or when academic publishing is involved.

I think we need more discussion of this - people have real reasons for keeping their packages out of organizations, even though we know that it’s detrimental overall. It’s a classic common property dilemma, and they often require clear rules and standards to solve.

7 Likes

Frankly, from rereading this topic I am not quite sure what exactly is being discussed or proposed here.

The desired outcome is pretty clear:

  1. there should be more high-quality, well-maintained Julia packages that
  2. interoperate nicely, and
  3. it should be easier to find them.

Generally this is accepted as a reasonable goal for all languages.

What is not clear is that this should come from a “centralized” effort. As usual, the devil is in the details: suppose that, as you are proposing,

So let’s say that the coordinators for domain X identify Y as a missing feature. What happens then?

Do they just implement it — then why do we need the extra organizational layers for this, instead of just someone making a PR? If they don’t implement it, then what’s the value added, given that most packages are well aware of missing features (and if not, someone can just open an issue?)

Similarly, I am not sure that those who propose that

have thought about the details of this process. Reviewing someone else’s code is a rather significant investment of time, but let’s suppose you find people to do this.

How is this different from just reviewing code and making a PR at an arbitrary point in the package lifetime? What happens when the suggestions are ignored for a while (because the author is busy), does this hold up registration? What if they are rejected outright, if the original author has a different programming style or vision for the package?

I would suggest that instead of proposing some centralized scheme, people interested in things like these should just experiment with the idea on a voluntary basis. To echo the point made by @fipelle: there is nothing here that requires concerted effort from the whole Julia community.

9 Likes

@fipelle I don’t see how it is the opposite of open source. Having an organized community doesn’t imply that we can’t be open source. It is like assuming that open source needs to be disorganized. I understand that everyone should have the freedom to do whatever they want online, open source their work, and so on. But like journal papers are reviewed by people working in a domain for years, software that is published on the General registry could be at least notified to people working with Julia for years in a specific domain. Notification is not review, however; it is just a means by which we can organize things, and offer suggestions to authors about where things could be placed compared to prior art.

After identifying a missing feature, coordinators can aggregate this information in a single place for potential contributors to be aware of and submit PRs. If someone ends up publishing a package that addresses the limitation, then coordinators can check off the item from the list of missing features and discuss with the package author how to improve integration with the existing stack.

I know well. As a reviewer of JOSS papers for Julia packages, I understand the commitment. It is like reviewing journal papers for scientific journals like some of us do.

I don’t know if I understood the question. It seems disconnected from the rest of the discussion. We are not discussing code review in a strict sense as you describe. We are discussing organization of domain efforts, and things that are more high-level that always happen in scientific communities. Keep in mind that the Julia community is very unique in the sense that many of us have PhDs or joined the language during a PhD seeing the value added. I think we should take advantage of this uniqueness somehow while publishing software. It is a differentiating feature of the community that is rarely discussed.

Overall, and coming back to the original question in the thread, I do believe that we would benefit from an organization of the community into domains of expertise for better integration of the work that is being done out there. GitHub organizations partially solve this issue, but more important than that is a method to be notified of all efforts in a given domain. Two methods have been proposed so far that I think are interesting: notifications for coordinators on Discourse and on the General registry whenever a package is tagged with domain keywords. To pursue these methods, however, one needs an additional method to startup a minimum set of coordinators for each domain.

For creating an initial set of coordinators, we could perhaps use the GitHub organizations that already exist. People showing interest could add their name to a shared document on a GitHub repository that would be checked out by some script in the General registry to notify users if that is possible.

I need to address this, because the way you present this is just not true. First of all, GitHub has a very, very clear presentation of exactly who, and by how much, has contributed to a repository: it is the “Contributors” page. There, noone can alter or take credit from you, regardless of where this package is located. And in my eyes, this should be the number 1 way a community measures credit regarding a repo.

Since you are clearly the main contributor to some repo DynamicGrids.jl, you can clearly show this to your company, even if your repo is located elsewhere.

Now, regarding papers and authors: both RecurrenceAnalysis.jl and Agents.jl, which are packages that joined the org. JuliaDynamics, have papers associated with them. The authors of these papers are the original developers of the packages. I am a scientist, so for me who is an author of a paper is daily life. By now for me it is generally clear who must be author, who could be and who has no reason to be.

3 Likes

Adding to this, in Julia packages there is room to acknowledge authorship in the Project.toml file, and some also include a CITATION.bib where you can define how you want the package to be cited. Repository owners — whether they are the authors, coordinators of an organization, or just people who forked or copied code from others — may/should use those resources to give proper credit.

Those are resources that are set up manually, not an automatic tool like the “Contributors” tab. Each way has its benefits and limitations, but are complementary to each other.

2 Likes

Ok those are good points. But… ownership is really practically different to commit record, in many ways.

  • Ownership is much easier to demonstrate to lay observers than git commits. Seriously, who understands what the commit record means.
  • Ownerership gives veto power over new pull requests and changes, which you lose in an organization. This is especially important if you are paid to make something to work a certain way.
  • When a package is in a private organizations page, they are quite clearly credited as the creators of it.
  • If the package moves to a Julia organization, the money and time an organization put in is no longer apparent - it will just look like I the developer, personally, did all the contribution based on commits. But I actually have collaborators who don’t write much code.

To be clear, I agree with you mostly, but I think you are wrong to believe that things mostly working out ok is enough for a functional collective commons. Lack of clarity over rewards disincentivises contribution from some proportion of the population, and we need to understand how that works to minimize it. People interested should read a few of Elinor Ostroms books on common property governance.

(Also, to be clear, I actively work on packages in multiple julia orgs)

15 Likes

I see. These, and more especially the last one, are some fair points that I haven’t considered, mostly because I’ve never worked in corporate settings.

6 Likes

I actually work in science research (ecology) in a small org that needs to win contracts. Recognition can be useful for that.

1 Like

Normally yes, but it does not need be like that. We are talking about free software, so you are allowed to create a repo on your private page with code that is completely made by another person. The repository name is not useful to acknowledge authorship in that case.

Cmon, we are talking about the actual registered julia package, with github stars, forks, etc. It has multiple references to establish legitimacy, and it does matter.

1 Like

I think there’s been some good discussion here but it’s difficult to make any progress with so many different problems that are trying to be solved at once. If people have a single thing that’s been identified that they care about it may be helpful to have a dedicated thread or raise and issue in the appropriate repository/organization.

5 Likes

Linking other example of package that could possibly be merged into existing efforts had the authors agreed on a generalized interface: Diagonalizations.jl: a package for multivariate linear filtering

1 Like

How contributions are acknowledged matters in different ways for different people. If you are trying to build a personal career that demonstrates competency in creating algorithms it’s not enough to be coauthor on many papers. You need to be first author on some in a completely unambiguous way. If you are trying to demonstrate that your lab/workgroup is responsible for making serious advancements in a field then attributing individual contributions may not translate into credit for that work group. I’m sure that there many other nuances I’m not considering but I just want to make a point that we can’t just assume the same tools solve the same problems for everyone.

6 Likes

To add to the discussion on GitHub organizations, evidence indicates that that they do not always improve discoverability of packages. I personally never search packages based on organizations. I do Google search or use https://pkg.julialang.org search. However, these organizations may boost the credibility of packages found via other methods, and thus create a collaboration hub. They also may increase the willingness of others to contribute with code.

1 Like

And another link on how to organize contributions via the concept of contributorship, which is becoming popular in journals: https://www.mdpi.com/2304-6775/7/3/48/htm

Github organizations cannot be followed on github.

It often bothers me that I can’t follow a github organization to see when new packages are created.

18 Likes

I think we really need to clear what is the problem you are trying to solve.

Open source needs to be free. If you want to have a centralised community you need to have mandatory requirements and binding reviews by domain experts (the selection process would deserve a separate and long discussion).

In my view, we are choosing between two different scenarios:

  • current conditions: open to organised and independent development (via orgs and individuals). This could result more easily in duplicate packages. However, user have more options (Individually, each package does not contribute much, but they contribute collectively).

  • centralised community: open to organised development, mostly closed to independent packages. Less duplicate packages. Users have less options.

In both cases users need to be able to filter the internet and find the right package for their needs. However, in the second case we will probably see the creation of some sort of alternative register for independent developers (or in the creation of a broad range of non-registered packages). This could make this whole process more complicated.

Furthermore, in the case of an organised community developers will enter in typical bounded rationality problem, since they will have to decide what goes into a large package, trying to interpret users desire.

If you believe that we have too many packages doing the same thing because people want to have ownership on their code, then I do not see how a notification system could do it - since it can be simply ignored.

If the discussion is about incentives and the use of GitHub orgs then it might be best to split the post or change the title.

4 Likes

Fortunately, I don’t think that tightening review/acceptance requirements for the General registry is considered. My understanding is that any kind of review/curation/quality control is encouraged to take place in other registries, maintained by organizations etc.

If someone believes that a centralized process for this can/should be organized, they should just go ahead and create a registry for that, and then we can see how it works.

2 Likes

I agree with what’s said here. I’d like to add an extra benefit of joining an organization for developers: you get other people to think and comment about problems you care and have thought about. This can be very rewarding and instructive. I, at least, learned a lot along the way.

2 Likes

New redundant packages appear because it is usually hard to merge pull requests into current repositories. Also, people don’t reuse code because of the reasons that I will state in the following.

My personal experience offers that there are a couple of patterns that repeat regarding this topic

  • When a package’s development is stopped or not worth it:
    E.g.: PackageCompiler and PackageCompilerX: the development of the PackageCompiler is very slow or stopped, and there is that JuliaCon youtube on how to rewrite PackageCompiler. That is why PackageCompilerX has emerged.
    Many other examples fit under this.

  • When developers of a package don’t want extra features:
    E.g.: Julia-Client and Juno-Plus: I submitted a PR to Julia-Client, however developers thought that Juno-Plus has too many extra features that should not be inside Julia-Client, and so the 2nd one got emerged.

  • When a package’s developer wants to minimize dependencies.
    E.g.: PrettyTables and AcuteML: despite all of the benefits of using another advanced package, the developer thought using AcuteML as the HTML backend will increase the overall loading time of the package.

  • It is hard to become a member of an organization:
    Requires contacting the main owner.

  • Lack of package tagging:
    Many mentioned this already in this thread.

  • Lack of documentation:
    Static Linter, Language Server are examples of packages with zero documentation. That is why no one knows how to use them anywhere else other than in VSCode.

  • Not allowing overriding a method:
    Overriding a method in Julia will result in an incremental precompilation warning. If someone wants to use some code but wants to override a method, they should either write the code from scratch or make a fork of the original package and edit the function there. However forks are not supported in Project.toml, and Manifest.toml files aren’t as nice as Project.tomls.

  • When people want to make a portfolio:
    In this case, people usually tend to write everything from scratch. And those, who have written a package on their own, tend to disagree with merging other’s people PRs.

A solution to some of these problems is allowing to have multiple packages in a repository:

or allowing to load a subset of a package

1 Like