Did Julia community do something to improve its correctness?

For one, I don’t think Julia is popular and AFAIK most agree. For another, that’s really something you have to see and judge for yourself. CPython is pretty popular, you can start reading issues here, which is only one label in the open issues, but one that’ll skew closer to bugs and weird behaviors.

New features are cool and all, they do make good marketing material for all the ways Julia can do even more things better now. I’m really not sure that’s what’s needed at this point though - Yuris blogpost has had such an impact outside of our community that it’s an inevitable conversation topic when someone asks about the language. Not multithreaded GC performance or whether you have an additional bad performance impact from not typing your globals.

The fact that this discussion got started precisely because this is still the way Julia is viewed outside our bubble and the fact that barely anything has happened to mitigate that is just testament to where the priorities lie - and that’s squarely on chasing shiny features.

I mean I get it, it’s not glamorous to have a bugfix release, but come on, the more features we pile on without fixing these kinds of longstanding issues, the worse the existing perception will get.


That isn’t chasing shiny new features, uninferrable globals performance and latency due to compilation, especially the unsaved work, are among the most frequent complaints about Julia. Many newcomers see these two particular issues as why Julia doesn’t really live up to its claim to solving the “two-language problem”. Just the last few versions made a massive change to that, though I bet many still won’t be sold until executables are easy to make and as small as possible.

Yuri’s blog raised many valid issues, his frustration is valid, and it’s his right to prefer not dealing with them anymore. But it’s just not spectacular for any language and its third-party libraries to have bugs that users have to raise; many of the examples in the blog were fixed long before the blog was written (which I think is a positive, not a negative), and many examples were fixed after his blog drew attention to them. The biggest longstanding issue is how AbstractArray code has 1-based assumptions that resulted in bugs for the OffsetArrays package (it’s bigger than that, there’s missing implementations too). Some fixes happened, but that’s a pretty big project and will need time to complete. But comparatively, not many use OffsetArrays, but everyone needs to import packages and compile methods. It’s not strange the latter got more work done sooner.


I described a pretty low bar, I can’t think of anything failing it in Python. Obviously there are bugs, but imho incorrect results in basic functions are a different species of problem.

Here’s another example, from the ecosystem, where string formatting is flagrantly incorrect, in a package with 677 dependents. I find it hard to imagine this happening in Python.

My impression is those performance features come from companies that need them paying to get them in rather than being motivated by advertising/coolness. That is just to say I don’t know what the solution to the correctness problem is if funders aren’t paying for it.


I think I’m not really grasping your point. I already linked a page listing many CPython issues about incorrect results in basic standard library functions, and I can’t do more to help you vet your opinion.

I’m not surprised about that from a third party package whose latest v0 release came in 2020. It’s not hard to find a package that deserves more developer attention in any language’s ecosystem, and it seems this example is getting it in the Format.jl fork.

I’m not disputing that Julia has issues, it’s just nothing said so far seems particular to Julia. So far I’ve just seen this recurrent anxiety that This Issue will drive any newcomers away from Julia, and This Issue changes depending on who you ask. I don’t think it’s unusual for more widespread issues to have been given more priority. I do find unusual the argument that less widespread issues, as much fun as I’ve had learning about them, would be more important, critical in fact, to Julia’s reputation.


I believe what you say about CPython. In fact a simple glance at Numpy shows over a thousand outstanding issues. But that doesn’t make Julia’s position any better, nor will it convince anyone on edge about the topic at hand. I think one advantage of having huge monolith packages is the need to not worry about how the package interacts with the rest of the ecosystem as much.

I guess one of the examples I point to was something first discussed years ago: incorrect gradient bugs in certain Julia ML packages. There were multiple stories of people spending months trying to debug, until finally they realized that their code was silently giving wrong results. See here and here for examples. If an AI startup ran into something like this, well, their competitors are potentially months ahead in development now because they used Python. I can’t recommend Julia for ML in part because of this. Do incorrect gradient bugs occur like this in Python, or C++? Seems particular to Julia to me.

The overarching theme remains: it is the attitude to which these bugs are approached that is the issue (the “culture”). In a post above, it was mentioned most bugs tagged with correctness aren’t release blocking, and it begs the question, why? I don’t buy the reason that they’re corner cases few run into…after 10 years bugs that show up better be corner cases, right? Maybe it’s a communication issue on what the priorities are or how they’re determined - I have no knowledge of the inner workings of the repo and neither do most other end users - but it seems odd that priority isn’t given to issues that silently give incorrect results. To me correctness is far and away the most important issue, not improved performance, and it doesn’t seem like everyone agrees.

I watched the state of Julia talk given this year and I saw 0 mentions of correctness issues. Maybe I missed something, but it seemed like the perfect opportunity to address an elephant in the room.


Just searching the string “incorrect gradient” in the issues of pytorch, jax, and tensorflow, apparently yes, some silent, some not. However, I’ve never done ML so I can’t say how important or prevalent these issues are, let alone make comparisons to Zygote.jl’s. I can only take the word of people who have used all these tools thoroughly in their work.

It’d be very strange to block a release that fixes some issues and adds demanded features just because other issues and features aren’t resolved yet. You mentioned Numpy’s outstanding issues, obviously releases kept coming. Developers need to agree to block a release to fix an issue or include a feature, it’s not a default decision.

Like I said before, correctness isn’t One Thing, some nebulous force threatening the future of the language. It’s just a fact of life that software has bugs that get patched incrementally and routinely. It’s so mundane that every big feature had to resolve their fair share of correctness issues at some point, there’s even a state of Julia slide about the progress on threading that casually lists “features/correctness” topics (each topic can have multiple issues).


Legitimate question: are Julia’s bugs seen as of a different character than other peer projects? E.g. the same kinds of issues seem to be present in other projects, e.g.:

E.g just some cursory searching finds incorrect results when using modulo arithmetic in rust or odd edge cases in Numpy related to mutability, complex-number types, unsigned integer behavior.

To my uninformed eye, these problems seem similar to those that have been identified in Julia?

Also, at the time of writing Julia has fewer open issues matching “bug incorrect” than Numpy does, though I admit not enough understanding of the nuances here to comment whether that metric is meaningful.


I think the nature of coding in Julia leads one to encounter “edge” cases more frequently than other languages, so it is more painful when these are wrong.

As an evocative but probably not-quite-right example, in Python if you have object types A and B and you want them to interact, one might first transform them into a very safe/robust/well-tested form, like a vector of floats or a dictionary etc., then perform the needed operations.

In Julia, one might call foo(::A, ::B) directly and just hope that each type is satisfying its interface well enough for foo to remain correct without any transformations to a “safe” type needed at all


Every single open-source project I know is basically starved for personpower. We’re volunteers too, and while we do occasionally beg for contributions, you shouldn’t need us to do that in order to realize you can help.

That doesn’t mean your PR will be reviewed instantly because of the issues pointed out so nicely in Did Julia community do something to improve its correctness? - #103 by gdalle, but hopefully someone will do it justice eventually.


It didn’t cross my mind to link the begging for contributions to being able to assess that I can help.

I am not sure to what extent others feel the same, but when it comes to evaluating what one can do, I am pretty confident when doing that evaluation for the projects I am working on as opposed to contributing to Julia-repo.

I know that there is always the make a PR, and let us tell you if it is garbage solution - but that feels more like fighting my way in approach. The same “…we’re volunteers too” applies here - and both the following were valid for a while now:

  • I really love Julia and I want it to thrive
  • I have (can make) time to contribute

In fact, I wanted to contribute in some way and after this topic cooled down I started to be more active here, on discourse.

I know that general true statements about what contributing to open-source projects means can be invoked and applied to Julia (and it is not like I am ignorant about those rules): however, I think we can at least entertain the idea that in a less technical way than, now mitigated, TTFX, Julia might have a time-to-first-PR issue (and is less relevant that other open-source projects might have this issue as well).

It was a clear TTFPR issue when I reported the following bug after I actually delved into Julia’s source code, detected the exact issue, and proposed the fix.

So why didn’t I PR the thing and only open an issue? Good question: I cannot pinpoint a single reason.

Maybe it was about a mix of various factors. But my honest answer is that I wanted to contribute, I had the time (and actually I had the local fix because the bug was really annoying) and still no PR.

If my case is singular and this I want + I can + I don’t mix is not a real issue that goes past my sole experience, then I think we can just let this go: I learned my lesson.

However, if there are things that can be done in the Julia community to reduce yet another TTF- something, then let’s use this opportunity and maybe lower the perceived bar for Julia-repo contributions.


I know that many people don’t contribute who perhaps should, but also that many who do submit PRs can have a frustrating experience waiting for reviews. I don’t know how to fix that. The fundamental tradeoff is that time spent reviewing PRs (and it takes a lot of time) means time not spent on other activities like fixing really hard problems. So we persist in an awkward balance where good stuff gets done but perhaps more slowly than ideal.

Anyway, this is a bit off-topic from this main thread, I’ll stop now. If you have ideas about how to fix it perhaps another topic?


I agree we should stop here for now. I’ll think about this.

Also - thank you for the hard work you put into the Julia ecosystem. Sometimes it is hard to point out some issues without sounding ungrateful/demanding: so again, thank you!


So here is a many page discussion about the reverse-zip bug I was about to fix, but when I saw from the lack of enthusiasm that the fix would mean having a PR lingering around for years again I left it be Iterators.reverse gives unexpected results when zipping iterators of different lengths · Issue #25583 · JuliaLang/julia · GitHub (I had already with #24978 a PR that took 3 years and with #29927 one that took 2 years)

1 Like

If you’re interested in discussing this specifically I’d recommend starting a new topic or reading over some of the previous discussions, but yes this specific example is probably the poster child of the hype and shiny features over correctness culture being decried.

However (and note this is coming from someone who has complained at length about the aforementioned issues), it does feel like one of the more extreme examples and not representative of the ecosystem as a whole. Case in point, ForwardDiff.jl has been quite solid for years now in the AD space. The reason this is a big deal despite only involving 2-3 libraries comes down to a) interest in ML as shown by anecdata and successive Julia Community Surveys, b) some of the packages being promoted as flagship libraries in the past such that everyone and their dog knows about them, and c) excessive hype about the capabilities of this specific corner of the ecosystem during early development which didn’t pan out for a variety of reasons.

I think you’re right to be skeptical about number of issues being a good metric here. Suffice it to say Zygote has/had a greater number of significant issues and issues in commonly-used code paths than the Python ADs being compared against, even if the absolute or relative number of issues is similar. It does seem like some lessons were learned from this, but unfortunately not before the Julia ecosystem experienced a reverse-mode AD “winter” which has only recently begun to thaw (again, a discussion for another thread).


It seems this thread has come to an awkward end. Can I take credit for that?


See What’s the aliasing story in Julia


(Edited) It was fixed after beta1. Still tagged for backporting. It’s been merged into Backports for julia v1.10.0-beta2 by KristofferC · Pull Request #50708 · JuliaLang/julia · GitHub so it should be in beta2.


Recently I’ve got a project that’s big enough for a deployment and also fairly technical, I was torn between Rust (Rust isn’t built for technical computing but the ecosystem is catching up fast and other advantages provided by Rust are almost too good) and Julia (I always wanted to try Julia for some real work in science computing, whereas I’ve been using R mainly and MATLAB several years back); but my attempt to do some work in Julia has come to a halt mainly because of Yuri’s blogpost. I’ve followed the Reddit’s discussion on Yuris’ post, and an earlier Julia community discussion on the same post.

There are, by now, a good number of opinions exchanged between both sides, with regards to the correctness issue raisd by Yuri in all 3 discussions. But I would really appreciate a formal address from Julia Dev Team on the correctness issue in Julia, regardless that the team agree or disagree (Or probably there is one already, please kindly point me to it).

Some argued that we should let Julia mature and these types of bugs were so common that many languages suffered before and time would cure all. Opponent disagreed with examples of R and Matlab Core. The Core R and Matlab really has been robust for very long (abundant evidence from all 3 discussions and other places). CRAN (R official package management), is too pedantic (almost notoriously) such that the majority of graduate students would not even consider publishing packages.

Some said that it is the third party packages from researchers and graduate students who were being a bit careless when it comes to programming (rather than solving math problems using computer language). Yuri however brought attention to Base Julia near the very top of the post and they were raised in Julia 1.7 ish back in 2021 or 2020 (e.g., prod function from Base, raised in Jan 2021, Yuri’s post was written at least before May 2022). It’s the number of correctness issues Yuri found and the likelihood of similar issues still pertaining in Core Julia that worried Yuri, in the first place at least. I believe OP was worried much, henceforth asking about new measures for correctness being employed or not. This is the very thing I’d hope the team would address as well, as a potential Julia user.

Others mentioned testing suites available to users such as Aqua.jl or JET.jl; I am not sure I, as an user, ought to write unit test on Base Julia functions as singular as prod, or to set up tests only to almost frequently find out that it was the Base Julia I’d have to fix.

While most critical posts of Julia from Google were merely complaints about user experience and how hard they’d have to depart from old habits built in other languages, Yuri’s post, on the other hand, constructed valid points on one of the main objectives of a technical computing language IMHO; that is to be correct in computation result.

I like Julia since it came out (I’ve been playing around in bitesize since 0.8 but I’ve not really used Julia for anything); I appreciate and support Julia’s ambitions, be it to solve “two-language problem” or to offer syntax similar to MATLAB with performance almost on par with C/C++; and yeah, multiple dispatch as well. Hand on heart I’d reall hope Julia to thrive and be the future of technical computing.