I wonder if this situation could be alleviated if Julia’s newsletter included a section called “Call for Participation”, as Rust’s newsletter has, where important projects in need can ask for help.
https://this-week-in-rust.org/blog/2024/01/31/this-week-in-rust-532/#call-for-participation-projects-and-speakers
3 posts were merged into an existing topic: Time limits for unfocused discourse threads?
I can answer that and add to the discussion of how nice it is to explore some data with very basic Julia tools. There’s a GitHub repo that has data on unique contributors to each language on GitHub (website), which shows the following:
julia> using Downloads, CSV, DataFrames
julia> df = CSV.read(Downloads.download("https://raw.githubusercontent.com/github/innovationgraph/main/data/languages.csv"), DataFrame)
93221×6 DataFrame
Row │ num_pushers language language_type iso2_code year quarter
│ Int64 String31 String15 String3 Int64 Int64
───────┼─────────────────────────────────────────────────────────────────────────
1 │ 2066 HTML markup AE 2020 1
2 │ 1627 CSS markup AE 2020 1
3 │ 288 Jupyter Notebook markup AE 2020 1
4 │ 108 Vue markup AE 2020 1
(...)
julia> langs = sort(combine(groupby(df[df.year .== 2023 .&& df.quarter .== 3, :], :language), :num_pushers => sum => :contributors), :contributors; rev = true)
305×2 DataFrame
Row │ language contributors
│ String31 Int64
─────┼─────────────────────────────
1 │ JavaScript 2844963
2 │ Python 1693151
3 │ Shell 1180231
4 │ TypeScript 891746
5 │ Java 814264
6 │ Dockerfile 738608
7 │ C 574814
8 │ C++ 573689
9 │ Makefile 493359
10 │ PHP 409361
11 │ C# 378008
12 │ Ruby 319163
13 │ CMake 280403
14 │ Batchfile 273940
15 │ Kotlin 230452
16 │ Go 218716
17 │ Objective-C 194070
18 │ Swift 180700
19 │ PowerShell 150834
20 │ Rust 135205
21 │ Lua 126559
22 │ Dart 118698
(...)
julia> julia_rank = findfirst(==("Julia"), langs.language)
69
julia> langs[julia_rank-10:julia_rank+10, :]
21×2 DataFrame
Row │ language contributors
│ String31 Int64
─────┼────────────────────────────
1 │ Meson 15925
2 │ Pascal 15885
3 │ GDB 15549
4 │ Scheme 15316
5 │ PLSQL 14926
6 │ Clojure 14709
7 │ Mathematica 14608
8 │ Smalltalk 14565
9 │ Raku 14119
10 │ NASL 14068
11 │ Julia 14010
12 │ ANTLR 13628
13 │ NSIS 13625
14 │ FreeMarker 12717
15 │ Verilog 12650
16 │ Forth 11508
17 │ CoffeeScript 11336
18 │ GAP 10981
19 │ Elixir 10927
20 │ SourcePawn 10883
21 │ Prolog 10376
There was a discussion on this on Slack - looking at the data it seems a bit odd to me (there are some weird jumps over time) and Julia is showing pretty good growth from 2020 Q1 to 2023 Q3 (the period spanned by the available data) but I think overall one has to say that Julia’s position is weaker than one might have expected.
NB: This is certainly interesting data and I can see there’s potentially a lot to say about this, but I would encourage people wanting to do so to open a new thread so as not to derail this (already derailing-prone) thread.
That is a good suggestion. Ideally, initiatives like this would be part of a major plan to increase contributorship across established ecosystems. People know we need more maintainers, but the habit will only be created after we define triggers and rewards:
We are not doing a great job promoting existing Julia projects in social media, helping new users with tutorials, joining discussions on community channels, etc.
Another weakness that I see is the lack of financial support from enterprises that sell Julia. At Arpeggeo®, we are doing the best we can to devote our resources to open source Julia projects, i.e., to convert money and time into bug fixes and improvements to the GeoStats.jl framework and its dozens of dependencies. But we are small compared to big players like JuliaHub, which certainly have more resources available to improve the open source ecosystem they depend on. I am certain that projects like Dagger.jl, Pluto.jl, Makie.jl are crucial to the success of the language, yet I don’t see explicit enterprise sponsorship.
So, if you don’t feel that you have the skills to contribute to your favorite project, consider sponsoring the maintainers. It is a simple button that you press, and a small amount that you pay every month to sustain the projects that you like so much. If you have a full-time job with guaranteed salary, there is no excuse to sponsor the packages you use everyday.
Documentation can be a thing to be improved. Illustration of creative and excellent uses of Julia keywords could be a move to attract more users for Julia.
For me, interoperability with other high-performance computing languages, mainly c++.
There was some paper awhile back (forget what it was) but it did show that in comparison, some of the Julia packages have a rather high bus factor. It’s quite common for example for a Python package to have a bus factor of no more than 2, with famous examples of that being things like pandas and and numba. We need more contributors and maintainers, that’s always going to be true, but our ratio of contributors to users seems to be rather high in comparison to other scientific ecosystems, at least the last time the data was checked.
It would be good to instead look at bus factor and somehow filter to only the registered packages. There’s lots of homework problem repos in Github with one maintainer, and that would each count as a unique person and is probably 99% of the Javascript number given what I’ve seen in the average job application
I think it is not fair to compare the bus factor in this case. The number of people who can understand the internals of Numba or Pandas (low-level stuff) is tiny.
What I meant in the comments above is that even projects with readable Julia code (code that is very accessible to beginners) don’t have enough occasional maintainers: users who submit issues, PRs, etc.
Wouldn’t you say it’s fair to compare pandas to, say, DataFrames.jl? The fact that pandas internals are hard to understand is not really a very good excuse
I’m not sure if pandas is one of those projects with a small number of people committing, they seem to have rather many recent contributors Contributors to pandas-dev/pandas · GitHub
I don’t know if I understand the argument. Are you saying that the complexity of the code and entrance level shouldn’t affect the bus factor?
My comments above are about native Julia packages. Try to compare them with native Python packages. I am assuming that the comparison will be more reasonable.
Someone with time can do the actual statistical work to compare these indices.
My shortlist of Julia’s weaknesses would probably be:
- Latency kills its use case for a huge number of applications - CLI tools, being called from other languages, anything real-time etc. Imagine if
ripgrep
was a Julia utility. - It’s difficult to actually run Julia code - either as an executable script, or as a binary. Like, if you were to create a tool in Julia and publish it for people to run who don’t know Julia and don’t care. There are tonnes of pretty bad options to do it, and you always get the feeling that Julia was supposed to be used at top-level with a human with knowledge about Julia in the loop to manually intervene.
- Julia’s static analysis is quite poor - I would say significantly worse than even Python in practice, not to speak of TypeScript or static languages.
- The Julia VSCode extension is much less helpful than my extensions for Python or Rust. It doesn’t understand the types of my code, there is no go to definition, the linter has a ton of false positives and so on. This is probably because Julia is extremely hard to analyse statically.
- The “lack of interfaces”. More generally speaking, the language provides zero help with building abstractions. No checked abstract types, no help with traits, none of that.
- A general lax culture of correctness where it feels like there are way too many bugs, even in core Julia, and several parts of Julia is subtly incorrect, inconsistent or hard/impossible to reason about. Things are expected to “just work” through heuristics instead of adhering to learnable rules that can be reasoned about (not to mention statically checked). It’s very hard to know what is supposed to work. Too many things are fraying at the edges.
- It’s too hard to write Julia code that doesn’t need maintenance because it spontaneously breaks, for several small reasons. The above point is one, but it’s also too easy to accidentally rely on internals, and the lack of interfaces makes it impossible to programmatically check for breaking changes
Edit: Whoops this post wasn’t supposed to be a reply to @ChrisRackauckas , but a general post in this topic.
This doesn’t really work since very many important python packages aren’t actually implemented in python, whereas in Julia they are implemented in Julia. This is a real problem with python in general. If we consider it a problem that a package has a small number of contributors, the reason for the small number doesn’t change the fact that it is a problem?
I think you are missing the point of my argument. Focusing on a small fraction of super popular packages in Python doesn’t address the maintenance issue I raised.
The issue we have goes all the way up to end-user packages working in specific silos. Biology, geology, … we have high-level packages for these things, but only 1 or 2 active contributors despite of hundreds of stars on GitHub. We could try to find examples, but that is not the point.
I kind of see this as all the same issue. It’s just interfaces. Interfaces are missing and could help a lot.
I’d boil the missing things down to:
- Interfaces and enforceable static behavior
- Better memory management (i.e. escape analysis to hit the GC less, faster frees, and better manual management tooling)
- Better binaries.
At least at this point, I think the “big 3” aren’t too controversial and most people seem to agree what needs to be done about them.
But that’s a separate thread. The question of this thread is " At present, in what aspects is Julia still relatively weak compared to other mainstream programming languages?". The point that is being made is that what you’re describing is a problem with open source in general, not something Julia is necessarily weaker than other mainstream programming languages for. And in fact, Julia’s community does quite well in terms of bus factor. This was highlighted in Heather Miller’s keynote in 2019:
So I don’t disagree we need more maintainers. But thinking Julia is unique in this aspect or way worse is something that has been repeatedly not been found to be the case through the empirical studies that have been done or published on it. Julialang/julia is doing pretty well, lots of SciML have a good number of contributors for each part of the code, we have a good number of folks in DataFrames.jl and such, all the while you have Wes McKinney famous for saying Pandas is a central package to Python but only has 2 people who ever touched it (that has somewhat changed after that quote went viral in like 2019, I think it’s 3 now ). So no, I don’t think we are comparatively weak there, about even or maybe a tad bit better but definitely not loads worse than what’s common in open source. That doesn’t mean Julia is perfect, but I would need to see some good hard evidence that thins have changed if you’re going to unseat the data on bus factors that people have published.
(I’m trying to find the per language data that someone put onto Slack, I think it was from 2021? That was the most comprehensive I’ve seen… was that @giordano?)
Disagree, but respect your opinion.
Outside of SciML and DataFrames.jl the situation is very different, at least that is how I feel it. It would be great if we could nail down more examples of packages that died over the years because of lack of maintainers.
I will point out three somewhat recent major losses:
- Transducers.jl (Takafumi Arakaki disappeared and the framework is gone)
- LoopVectorization.jl (Chris Elrod was the main active maintainer)
- DelaunayTriangulation.jl (Daniel VandenHeuvel was the only maintainer)
People leave the community or decide to work on something else for valid reasons, and the packages die. Community members jump in to help replace the packages by new packages, but that is only a remediation, it doesn’t address the issue at the core.
Seeing this over the years, I really think that people should consider sponsoring projects they use daily if they cannot become a maintainer due to lack of skills.
I don’t know anything about DelaunayTriangulation
but at least for the other two examples I don’t think they fit your characterization of
Both packages are very complex and afaik hard to maintain because they’re doing complicated things that sometimes have to reach into Julia internals (and for Transducers there’s some hope in JuliaFolds2 · GitHub).
Agree. These are not ideal examples, but I understand they are native Julia. If they are not, then please ignore them.
The dynamic at play here is that Julia is a small community, and Python/R absorbs people in their full-time jobs. If there is not a financial incentive or other incentives to increase and retain the number of maintainers, we will never quit this loop.