There was a pretty good critical article written about Julia recently called “Why I no longer recommend Julia” by Yuri Vishnevsky
There is an ongoing discussion about the article that has largely been constructive on Slack.
Hoping to capture some of these thoughts here to provide actionable points for the community.
Feel free to comment or continue streams of thought here about the article and what actions could be done next.
Thanks and remember: this is meant to be a cordial and constructive conversation about the points raised in this article!
Unfortunately, I think the author is right. I only have experience with Python, Julia and Rust (and a little Perl), but I’ve never encounted even close to as many bugs in other languages as I do in Julia. Both in packages and in Base.
The question is why.
I think it’s the combination of:
A) An extremely generic language where everything is built upon shared abstractions and generic, extendible functions, and
B) No way of specifying or checking abstract interfaces, so no-one really knows what those shared abstractions are, or whether they violate them.
C) No clear way of specifying what is internal behaviour of a package/function/struct and what is stable, meaning it’s way too easy to rely on internal behaviour.
I think we, as a community, should take this problem seriously, and think carefully about a potential solution.
My suggestion for these issues of correctness and my suggestion for issues of performance/static compilation are basically the same – there should be a Julia “core” (in both Base and in other libraries) that is not at all generic and is essentially C code written in Julia. This code won’t need to cover an infinite set of types (so rigorous testing is feasible) and this code could be trivially statically compiled if the compiler got better because exactly which methods exist and which are called could be fully determined statically.
In general, I fear the ever-growing push for more generic code makes both performance and correctness worse with no end in sight. And I don’t think most people actually need that much generic code.
I’ve opened an issue about adding a pass for detecting for i=1:length(v) to StaticLint. That particular issue at least seems like it might be easily addressed with tooling.
I believe the sentiment of much of this is correct and must be taken seriously, but a disproportionate amount of them are about OffsetArray as far as I can tell? I understand he is using this to explain how the composability has failed, but it seems reasonably niche. Maybe that loosely-defined generic interface isn’t really as composable as one would like, but people can get by 1-based indices if necessary.
But as I said, I think the spirit of this must be taken very seriously. Not to mention discussing Zygote vs. Pytorch/JAX, which opens up a new can of worms.
Its not about the length. The code assumes that a and b have indices from 1:length(a) which is not the case for all AbstractArrays, for eg., OffsetArrays.
That’s one way of dealing with the issue. Another way is finding the abstractions that could “tame” the composability a bit, as mike put it:
I think it’s absolutely right to celebrate Julia’s approach to composition. I also hope new research (in Julia or elsewhere) will help us figure out how to tame it a bit.
Which is in spirit what I advocate for here . But not sure if its feasible at this late point.
The flexibility vs structure issue goes beyond composability into the correctness of compiler transforms and predictability of Julia performance model. Two concerns made more acute by AD, GPU and other things we now ask of Julia’s semantics, which is referenced by the zygote issue, but goes further. Julia’s full dynamism needs a touch of restriction, even if opt in, if it’s ever going to reach the promised full language diff programming+ composability+GPU. It’s currently trying to do that in ad hoc ways like immutable arrays and pure dl frameworks…but if you have purity without handling effects that’s just Jax (except without TPU, linalg optimizations, and inplace update copy elision (for now?)).
And Jax is already really good. I say this with love and a bit of disappointment but I think for Julia to succeed and needs work on the structure side not just the compiler side
Dex is a good example of a language that attempts to strike a balance (again see my post (State of machine learning in Julia - #25 by Akatz and the dex issue about adhoc polymorphism and purity but with effect handlers. Maybe Julia has another local optimum??? But it’s a hard design problem
I think it’s a fundamental existential problem for the language but unfortunately Yuri is correct that I don’t see it acknowledged widely. I’ve witnessed some explicit dismissal of alternative approaches (like language level traits and more fundamental approaches to handling mutation) and I think Julia is doing that at its own peril.
Though, how much Enzyme can help remains to be seen…but even if it works, it ties Julia to LLVM (so no compilation to XLA and TPUs) and unsure how well it will do with high level branching code.
I agree that it would be helpful to have an accessible system for interface tests (i.e. registering tests for AbstractArrays that the author of a custom array type can easily find and run against their own type). Invenia published a package and/or a workflow for interface testing, but it is not well known and it requires a fair amount of ceremony to get right. This could establish whether e.g. addition must be commutative for some abstract type, or important facts about how iteration should behave.
For the issue of packages using the interface of a value incorrectly, I think tooling and greater awareness of the issues could help. An important part might be some way of discovering what the interfaces actually are. That would be handy both for people implementing new types and for people using interfaces.
Simple tools like an easy way to run a package’s tests but replacing regular arrays with star wars arrays and with @inbounds turned off might be handy. Linters and property testing have been successful for other languages, I think.
I don’t see any quick way around the correctness issues with our statistics libraries. Perhaps someone could write a bunch more tests or copy tests from R or python libraries that do similar work?
This is the most tractable way to address this at the moment. We need testing tools to see if common interfaces such as the AbstractArray interface are being correctly used. These testing tools become the de facto definition of the interface.
Clearly there was a lot of effort put into this pull request. How can we help @Lilith get this pull request merged? It looks like there are some outstanding issues still from @nalimilan .
Also what can we suggest in general about pull requests to increase the merge rate?
I do not think the article is entirely fair, while it is of much greater quality than the average article criticizing Julia.
The comparison is done to older and more mainstream libraries. While this is a legitimate viewpoint for choosing what to use for your work just now, well, it is not fair in the more general sense. The comparison would need to be between ecosystems/languages at the same level of maturity. The more interesting question to me is if Julia will have these problems when it reaches the same age and popularity of other languages/frameworks have now.
While I do understand the systemic argument (and, in fact, I use it to subjects like racism and discrimination in general), the examples are a little lacking. Checking for aliasing does not seem for me a responsibility of most functions (i.e., you should not assume you can pass the same object as two distinct arguments unless stated otherwise), and the bounds problem is also more nuanced. The code may be right for the Julia version it was written, but inadvertently kept for newer version. I think my disagreement is rooted on a different perspective in which I do accept the extra flexibility of generality has the cost of having me check if the pieces actually work well together instead of assuming it will work flawlessly. The element of unfairness in this comparison, to me, is that we would need to be comparing to an equally flexible language, so Python is fair (just had a lot more time to mature), other languages do not allow the generality that Julia will allow, and therefore the bugs cannot be pinned on the language itself but instead the bugs will be pinned to each individual re-implementation of a method because the language did not allow for generality. There is a lot of problems with OffsetArrays.jl but most other languages do not even something like OffsetArrays.jl or the expectation that most code written would automatically work with custom indexes.
The conclusion of the article is a little muddy. The article is a personal recollection of facts related to a change of posture of the author, so it does not go out of its way to offer a solution and even admits that what is identified as a systemic problem may be unsolvable (maybe it is inherent to high generality?). The statement ``For the majority of use cases the Julia team wants to service, the risks are simply not worth the rewards.‘’ is probably the strongest claim in the article, and it is hard to rebut, but not because it is right but because it is too informal (what are “the majority of use cases the Julia team wants to service”, how are you doing this risk/reward analysis for each of them?). Everyone can only argue for their use case, for example, in my case I think the risk/reward is worth it, but the author of the article gets to make a blanket statement like this without really presenting this analysis on their article (again, it does not even compare metrics with other languages/frameworks, so the only really solid claim is that Julia has problems, not even that it is worst than others).
I agree partially on the generality problem, by this I mean: we could have what we have today (in terms of generality) but with less bugs. I do not think the fault is at the language design, a trade-off was made, and I like the trade-off (will not be the best for every use case, of course, but no language will). I think the problem is within the community, but not in the same sense that the author of the article implies. I believe the interfaces should remain (in technical terms) the same way they are right now, but better described by their authors, and should be the responsibility of each module proposing the use of some interface (including the ones at Base) to provide a test suite that checks invariants for an object of a type that implements the interface. If the object/type pass the test suite, but do not work with a function that assumes the object implements such interface, then the problem is within the function (it understands the interface incorrectly).
Julia as an open ecosystem means that package authors never realize how much things outside are uncovered by their limited test cases. This is both good and bad – designing an interface concept, if achieved, might be harmful in general in the sense that people will be more restrictive to types and eventually make it a closed ecosystem.
And the flexibility of multiple dispatch makes it so easy to hit a no-man’s land with encrypted error messages that only experienced developer understand. It is often the case that you spend days tracing to the right dispatch route only to add or fix one small method. This is also the thing that I’ve seen too many Julia users (including myself) complain about. – This is, again, both good and bad.
The solution? I don’t know. More carefully written test cases, and more documentation to explain the design and educate the users, I guess. But many package authors don’t take test and docs seriously – people are often fooled by the code coverage and thought it’s near 100%. Absolutely no. Your code coverage might be far less than 10%, if speaking of all the valid input compositions.
Speaking of number of lines, Julia’s composibility and flexibility means that one often needs a 1:1 or even 1:3 src-test codes ratio only to ensure “most” of things work. – but most people only test one or two use cases, which is far away from enough. If you check OffsetArrays.jl codebase using cloc, you’ll find out approximately 900 lines in src and 2200 lines in test.
When I try to depend on a package that I don’t maintain, I often check how carefully the authors write tests. If the tests are not well-written, I’d refrain from using it – no matter how good it declares to be. On this, I really, really appreciate how @oxinabox write the tests in all packages she maintains (ChainRules even has accompanied test helper package ChainRulesTestUtils) and I always feel lucky that my first few Julia contributions were under her review.
Even if we did our best job (I tried very hard in writing tests when developing JuliaImages), we still always get suprising bug report when users don’t follow the design. For instance, I pass Array{<:Colorant} into Distances , which is a big suprise for stats people. And unless we start to be more restrictive on function type annotation, we have little to do – but we want to build a shared, open ecosystem.
Just to make my phrasing more clear: I meant that many packages do not expect custom index start and therefore do not work well with OffsetArrays.jl not that the package itself is implemented wrong.
But to answer your question, this depends on what OffsetArrays.jl has “opted into fast linear indexing”, as eachindex says:
eachindex(A…)
Create an iterable object for visiting each index of an AbstractArray A in an efficient manner. For array types that have opted into fast linear indexing (like Array), this is simply the range 1:length(A). For other array types, return a specialized Cartesian range to efficiently index into the array with indices specified for every dimension. For other iterables, including strings and dictionaries, return an iterator object supporting arbitrary index types (e.g. unevenly spaced or non-integer indices).
On the other hand, I really dislike Base.eachindex definition, and I would prefer to have it changed.