At present, in what aspects is Julia still relatively weak compared to other mainstream programming languages?

I wonder if this situation could be alleviated if Julia’s newsletter included a section called “Call for Participation”, as Rust’s newsletter has, where important projects in need can ask for help.
https://this-week-in-rust.org/blog/2024/01/31/this-week-in-rust-532/#call-for-participation-projects-and-speakers

12 Likes

3 posts were merged into an existing topic: Time limits for unfocused discourse threads?

I can answer that and add to the discussion of how nice it is to explore some data with very basic Julia tools. There’s a GitHub repo that has data on unique contributors to each language on GitHub (website), which shows the following:

julia> using Downloads, CSV, DataFrames

julia> df = CSV.read(Downloads.download("https://raw.githubusercontent.com/github/innovationgraph/main/data/languages.csv"), DataFrame)
93221×6 DataFrame
   Row │ num_pushers  language          language_type  iso2_code  year   quarter
       │ Int64        String31          String15       String3    Int64  Int64
───────┼─────────────────────────────────────────────────────────────────────────
     1 │        2066  HTML              markup         AE          2020        1
     2 │        1627  CSS               markup         AE          2020        1
     3 │         288  Jupyter Notebook  markup         AE          2020        1
     4 │         108  Vue               markup         AE          2020        1
(...)

julia> langs = sort(combine(groupby(df[df.year .== 2023 .&& df.quarter .== 3, :], :language), :num_pushers => sum => :contributors), :contributors; rev = true)
305×2 DataFrame
 Row │ language       contributors
     │ String31       Int64
─────┼─────────────────────────────
   1 │ JavaScript          2844963
   2 │ Python              1693151
   3 │ Shell               1180231
   4 │ TypeScript           891746
   5 │ Java                 814264
   6 │ Dockerfile           738608
   7 │ C                    574814
   8 │ C++                  573689
   9 │ Makefile             493359
  10 │ PHP                  409361
  11 │ C#                   378008
  12 │ Ruby                 319163
  13 │ CMake                280403
  14 │ Batchfile            273940
  15 │ Kotlin               230452
  16 │ Go                   218716
  17 │ Objective-C          194070
  18 │ Swift                180700
  19 │ PowerShell           150834
  20 │ Rust                 135205
  21 │ Lua                  126559
  22 │ Dart                 118698
(...)

julia> julia_rank = findfirst(==("Julia"), langs.language)
69

julia> langs[julia_rank-10:julia_rank+10, :]
21×2 DataFrame
 Row │ language      contributors
     │ String31      Int64
─────┼────────────────────────────
   1 │ Meson                15925
   2 │ Pascal               15885
   3 │ GDB                  15549
   4 │ Scheme               15316
   5 │ PLSQL                14926
   6 │ Clojure              14709
   7 │ Mathematica          14608
   8 │ Smalltalk            14565
   9 │ Raku                 14119
  10 │ NASL                 14068
  11 │ Julia                14010
  12 │ ANTLR                13628
  13 │ NSIS                 13625
  14 │ FreeMarker           12717
  15 │ Verilog              12650
  16 │ Forth                11508
  17 │ CoffeeScript         11336
  18 │ GAP                  10981
  19 │ Elixir               10927
  20 │ SourcePawn           10883
  21 │ Prolog               10376

There was a discussion on this on Slack - looking at the data it seems a bit odd to me (there are some weird jumps over time) and Julia is showing pretty good growth from 2020 Q1 to 2023 Q3 (the period spanned by the available data) but I think overall one has to say that Julia’s position is weaker than one might have expected.

NB: This is certainly interesting data and I can see there’s potentially a lot to say about this, but I would encourage people wanting to do so to open a new thread so as not to derail this (already derailing-prone) thread.

5 Likes

That is a good suggestion. Ideally, initiatives like this would be part of a major plan to increase contributorship across established ecosystems. People know we need more maintainers, but the habit will only be created after we define triggers and rewards:

The Power of Habit
image

We are not doing a great job promoting existing Julia projects in social media, helping new users with tutorials, joining discussions on community channels, etc.

Another weakness that I see is the lack of financial support from enterprises that sell Julia. At Arpeggeo®, we are doing the best we can to devote our resources to open source Julia projects, i.e., to convert money and time into bug fixes and improvements to the GeoStats.jl framework and its dozens of dependencies. But we are small compared to big players like JuliaHub, which certainly have more resources available to improve the open source ecosystem they depend on. I am certain that projects like Dagger.jl, Pluto.jl, Makie.jl are crucial to the success of the language, yet I don’t see explicit enterprise sponsorship.

So, if you don’t feel that you have the skills to contribute to your favorite project, consider sponsoring the maintainers. It is a simple button that you press, and a small amount that you pay every month to sustain the projects that you like so much. If you have a full-time job with guaranteed salary, there is no excuse to sponsor the packages you use everyday.

5 Likes

Documentation can be a thing to be improved. Illustration of creative and excellent uses of Julia keywords could be a move to attract more users for Julia.

1 Like

For me, interoperability with other high-performance computing languages, mainly c++.

There was some paper awhile back (forget what it was) but it did show that in comparison, some of the Julia packages have a rather high bus factor. It’s quite common for example for a Python package to have a bus factor of no more than 2, with famous examples of that being things like pandas and and numba. We need more contributors and maintainers, that’s always going to be true, but our ratio of contributors to users seems to be rather high in comparison to other scientific ecosystems, at least the last time the data was checked.

7 Likes

It would be good to instead look at bus factor and somehow filter to only the registered packages. There’s lots of homework problem repos in Github with one maintainer, and that would each count as a unique person and is probably 99% of the Javascript number given what I’ve seen in the average job application :sweat_smile:

7 Likes

I think it is not fair to compare the bus factor in this case. The number of people who can understand the internals of Numba or Pandas (low-level stuff) is tiny.

What I meant in the comments above is that even projects with readable Julia code (code that is very accessible to beginners) don’t have enough occasional maintainers: users who submit issues, PRs, etc.

1 Like

Wouldn’t you say it’s fair to compare pandas to, say, DataFrames.jl? The fact that pandas internals are hard to understand is not really a very good excuse :confused:

I’m not sure if pandas is one of those projects with a small number of people committing, they seem to have rather many recent contributors Contributors to pandas-dev/pandas · GitHub

1 Like

I don’t know if I understand the argument. Are you saying that the complexity of the code and entrance level shouldn’t affect the bus factor?

My comments above are about native Julia packages. Try to compare them with native Python packages. I am assuming that the comparison will be more reasonable.

Someone with time can do the actual statistical work to compare these indices.

My shortlist of Julia’s weaknesses would probably be:

  • Latency kills its use case for a huge number of applications - CLI tools, being called from other languages, anything real-time etc. Imagine if ripgrep was a Julia utility.
  • It’s difficult to actually run Julia code - either as an executable script, or as a binary. Like, if you were to create a tool in Julia and publish it for people to run who don’t know Julia and don’t care. There are tonnes of pretty bad options to do it, and you always get the feeling that Julia was supposed to be used at top-level with a human with knowledge about Julia in the loop to manually intervene.
  • Julia’s static analysis is quite poor - I would say significantly worse than even Python in practice, not to speak of TypeScript or static languages.
  • The Julia VSCode extension is much less helpful than my extensions for Python or Rust. It doesn’t understand the types of my code, there is no go to definition, the linter has a ton of false positives and so on. This is probably because Julia is extremely hard to analyse statically.
  • The “lack of interfaces”. More generally speaking, the language provides zero help with building abstractions. No checked abstract types, no help with traits, none of that.
  • A general lax culture of correctness where it feels like there are way too many bugs, even in core Julia, and several parts of Julia is subtly incorrect, inconsistent or hard/impossible to reason about. Things are expected to “just work” through heuristics instead of adhering to learnable rules that can be reasoned about (not to mention statically checked). It’s very hard to know what is supposed to work. Too many things are fraying at the edges.
  • It’s too hard to write Julia code that doesn’t need maintenance because it spontaneously breaks, for several small reasons. The above point is one, but it’s also too easy to accidentally rely on internals, and the lack of interfaces makes it impossible to programmatically check for breaking changes

Edit: Whoops this post wasn’t supposed to be a reply to @ChrisRackauckas , but a general post in this topic.

25 Likes

This doesn’t really work since very many important python packages aren’t actually implemented in python, whereas in Julia they are implemented in Julia. This is a real problem with python in general. If we consider it a problem that a package has a small number of contributors, the reason for the small number doesn’t change the fact that it is a problem?

1 Like

I think you are missing the point of my argument. Focusing on a small fraction of super popular packages in Python doesn’t address the maintenance issue I raised.

The issue we have goes all the way up to end-user packages working in specific silos. Biology, geology, … we have high-level packages for these things, but only 1 or 2 active contributors despite of hundreds of stars on GitHub. We could try to find examples, but that is not the point.

I kind of see this as all the same issue. It’s just interfaces. Interfaces are missing and could help a lot.

I’d boil the missing things down to:

  1. Interfaces and enforceable static behavior
  2. Better memory management (i.e. escape analysis to hit the GC less, faster frees, and better manual management tooling)
  3. Better binaries.

At least at this point, I think the “big 3” aren’t too controversial and most people seem to agree what needs to be done about them.

28 Likes

But that’s a separate thread. The question of this thread is " At present, in what aspects is Julia still relatively weak compared to other mainstream programming languages?". The point that is being made is that what you’re describing is a problem with open source in general, not something Julia is necessarily weaker than other mainstream programming languages for. And in fact, Julia’s community does quite well in terms of bus factor. This was highlighted in Heather Miller’s keynote in 2019:

So I don’t disagree we need more maintainers. But thinking Julia is unique in this aspect or way worse is something that has been repeatedly not been found to be the case through the empirical studies that have been done or published on it. Julialang/julia is doing pretty well, lots of SciML have a good number of contributors for each part of the code, we have a good number of folks in DataFrames.jl and such, all the while you have Wes McKinney famous for saying Pandas is a central package to Python but only has 2 people who ever touched it (that has somewhat changed after that quote went viral in like 2019, I think it’s 3 now :sweat_smile:). So no, I don’t think we are comparatively weak there, about even or maybe a tad bit better but definitely not loads worse than what’s common in open source. That doesn’t mean Julia is perfect, but I would need to see some good hard evidence that thins have changed if you’re going to unseat the data on bus factors that people have published.

(I’m trying to find the per language data that someone put onto Slack, I think it was from 2021? That was the most comprehensive I’ve seen… was that @giordano?)

4 Likes

Disagree, but respect your opinion.

Outside of SciML and DataFrames.jl the situation is very different, at least that is how I feel it. It would be great if we could nail down more examples of packages that died over the years because of lack of maintainers.

I will point out three somewhat recent major losses:

  • Transducers.jl (Takafumi Arakaki disappeared and the framework is gone)
  • LoopVectorization.jl (Chris Elrod was the main active maintainer)
  • DelaunayTriangulation.jl (Daniel VandenHeuvel was the only maintainer)

People leave the community or decide to work on something else for valid reasons, and the packages die. Community members jump in to help replace the packages by new packages, but that is only a remediation, it doesn’t address the issue at the core.

Seeing this over the years, I really think that people should consider sponsoring projects they use daily if they cannot become a maintainer due to lack of skills.

3 Likes

I don’t know anything about DelaunayTriangulation but at least for the other two examples I don’t think they fit your characterization of

Both packages are very complex and afaik hard to maintain because they’re doing complicated things that sometimes have to reach into Julia internals (and for Transducers there’s some hope in JuliaFolds2 · GitHub).

3 Likes

Agree. These are not ideal examples, but I understand they are native Julia. If they are not, then please ignore them.

The dynamic at play here is that Julia is a small community, and Python/R absorbs people in their full-time jobs. If there is not a financial incentive or other incentives to increase and retain the number of maintainers, we will never quit this loop.

1 Like