Did Julia community do something to improve its correctness?

No matter how established a language, there’s always more issues than there are people who can work on and review them. It’s not a great feeling when issues relevant to you are not as prioritized, but that doesn’t imply neglect. We got typed globals and const fields in mutable types in 1.8, we got native code caching and an interactive thread pool in 1.9, we’re scheduled to get multithreaded GC in 1.10. It made sense these big things had priority.

We can mitigate this with more contributors and trusted reviewers, which I think could be taught or at least set some guidelines beyond “open an issue.” For example, despite how often I chime in on issues, I am not confident enough in my testing practices to do more than spotting problems, and I wouldn’t even know what a proper documentation edit would look like. It’s legitimately more intimidating because incorrect changes to the source material can cause more widespread problems than incorrect opinions on a discourse thread.

That said, there’s never going to be enough people to cover all the issues. Look up a language of any note on Github, there are thousands of open issues, some over a decade old, and a good chunk of them are bugs and incorrect results. Always a struggle to get enough people to fix one small piece of the world, software isn’t any different.

7 Likes

I might be naive, but I don’t think most popular languages would let something like the reverse-zip bug go past multiple releases. That is,

  • incorrect results
  • with no warning or error
  • for documented behavior
  • in generic built-in functions
  • in multiple releases after being reported

Are other languages full of these? What do you have in mind?

5 Likes

For one, I don’t think Julia is popular and AFAIK most agree. For another, that’s really something you have to see and judge for yourself. CPython is pretty popular, you can start reading issues here, which is only one label in the open issues, but one that’ll skew closer to bugs and weird behaviors.

New features are cool and all, they do make good marketing material for all the ways Julia can do even more things better now. I’m really not sure that’s what’s needed at this point though - Yuris blogpost has had such an impact outside of our community that it’s an inevitable conversation topic when someone asks about the language. Not multithreaded GC performance or whether you have an additional bad performance impact from not typing your globals.

The fact that this discussion got started precisely because this is still the way Julia is viewed outside our bubble and the fact that barely anything has happened to mitigate that is just testament to where the priorities lie - and that’s squarely on chasing shiny features.

I mean I get it, it’s not glamorous to have a bugfix release, but come on, the more features we pile on without fixing these kinds of longstanding issues, the worse the existing perception will get.

21 Likes

That isn’t chasing shiny new features, uninferrable globals performance and latency due to compilation, especially the unsaved work, are among the most frequent complaints about Julia. Many newcomers see these two particular issues as why Julia doesn’t really live up to its claim to solving the “two-language problem”. Just the last few versions made a massive change to that, though I bet many still won’t be sold until executables are easy to make and as small as possible.

Yuri’s blog raised many valid issues, his frustration is valid, and it’s his right to prefer not dealing with them anymore. But it’s just not spectacular for any language and its third-party libraries to have bugs that users have to raise; many of the examples in the blog were fixed long before the blog was written (which I think is a positive, not a negative), and many examples were fixed after his blog drew attention to them. The biggest longstanding issue is how AbstractArray code has 1-based assumptions that resulted in bugs for the OffsetArrays package (it’s bigger than that, there’s missing implementations too). Some fixes happened, but that’s a pretty big project and will need time to complete. But comparatively, not many use OffsetArrays, but everyone needs to import packages and compile methods. It’s not strange the latter got more work done sooner.

7 Likes

I described a pretty low bar, I can’t think of anything failing it in Python. Obviously there are bugs, but imho incorrect results in basic functions are a different species of problem.

Here’s another example, from the ecosystem, where string formatting is flagrantly incorrect, in a package with 677 dependents. I find it hard to imagine this happening in Python.

My impression is those performance features come from companies that need them paying to get them in rather than being motivated by advertising/coolness. That is just to say I don’t know what the solution to the correctness problem is if funders aren’t paying for it.

2 Likes

I think I’m not really grasping your point. I already linked a page listing many CPython issues about incorrect results in basic standard library functions, and I can’t do more to help you vet your opinion.

I’m not surprised about that from a third party package whose latest v0 release came in 2020. It’s not hard to find a package that deserves more developer attention in any language’s ecosystem, and it seems this example is getting it in the Format.jl fork.

I’m not disputing that Julia has issues, it’s just nothing said so far seems particular to Julia. So far I’ve just seen this recurrent anxiety that This Issue will drive any newcomers away from Julia, and This Issue changes depending on who you ask. I don’t think it’s unusual for more widespread issues to have been given more priority. I do find unusual the argument that less widespread issues, as much fun as I’ve had learning about them, would be more important, critical in fact, to Julia’s reputation.

2 Likes

I believe what you say about CPython. In fact a simple glance at Numpy shows over a thousand outstanding issues. But that doesn’t make Julia’s position any better, nor will it convince anyone on edge about the topic at hand. I think one advantage of having huge monolith packages is the need to not worry about how the package interacts with the rest of the ecosystem as much.

I guess one of the examples I point to was something first discussed years ago: incorrect gradient bugs in certain Julia ML packages. There were multiple stories of people spending months trying to debug, until finally they realized that their code was silently giving wrong results. See here and here for examples. If an AI startup ran into something like this, well, their competitors are potentially months ahead in development now because they used Python. I can’t recommend Julia for ML in part because of this. Do incorrect gradient bugs occur like this in Python, or C++? Seems particular to Julia to me.

The overarching theme remains: it is the attitude to which these bugs are approached that is the issue (the “culture”). In a post above, it was mentioned most bugs tagged with correctness aren’t release blocking, and it begs the question, why? I don’t buy the reason that they’re corner cases few run into…after 10 years bugs that show up better be corner cases, right? Maybe it’s a communication issue on what the priorities are or how they’re determined - I have no knowledge of the inner workings of the repo and neither do most other end users - but it seems odd that priority isn’t given to issues that silently give incorrect results. To me correctness is far and away the most important issue, not improved performance, and it doesn’t seem like everyone agrees.

I watched the state of Julia talk given this year and I saw 0 mentions of correctness issues. Maybe I missed something, but it seemed like the perfect opportunity to address an elephant in the room.

13 Likes

Just searching the string “incorrect gradient” in the issues of pytorch, jax, and tensorflow, apparently yes, some silent, some not. However, I’ve never done ML so I can’t say how important or prevalent these issues are, let alone make comparisons to Zygote.jl’s. I can only take the word of people who have used all these tools thoroughly in their work.

It’d be very strange to block a release that fixes some issues and adds demanded features just because other issues and features aren’t resolved yet. You mentioned Numpy’s outstanding issues, obviously releases kept coming. Developers need to agree to block a release to fix an issue or include a feature, it’s not a default decision.

Like I said before, correctness isn’t One Thing, some nebulous force threatening the future of the language. It’s just a fact of life that software has bugs that get patched incrementally and routinely. It’s so mundane that every big feature had to resolve their fair share of correctness issues at some point, there’s even a state of Julia slide about the progress on threading that casually lists “features/correctness” topics (each topic can have multiple issues).

5 Likes

Legitimate question: are Julia’s bugs seen as of a different character than other peer projects? E.g. the same kinds of issues seem to be present in other projects, e.g.:

E.g just some cursory searching finds incorrect results when using modulo arithmetic in rust or odd edge cases in Numpy related to mutability, complex-number types, unsigned integer behavior.

To my uninformed eye, these problems seem similar to those that have been identified in Julia?

Also, at the time of writing Julia has fewer open issues matching “bug incorrect” than Numpy does, though I admit not enough understanding of the nuances here to comment whether that metric is meaningful.

6 Likes

I think the nature of coding in Julia leads one to encounter “edge” cases more frequently than other languages, so it is more painful when these are wrong.

As an evocative but probably not-quite-right example, in Python if you have object types A and B and you want them to interact, one might first transform them into a very safe/robust/well-tested form, like a vector of floats or a dictionary etc., then perform the needed operations.

In Julia, one might call foo(::A, ::B) directly and just hope that each type is satisfying its interface well enough for foo to remain correct without any transformations to a “safe” type needed at all

9 Likes

Every single open-source project I know is basically starved for personpower. We’re volunteers too, and while we do occasionally beg for contributions, you shouldn’t need us to do that in order to realize you can help.

That doesn’t mean your PR will be reviewed instantly because of the issues pointed out so nicely in Did Julia community do something to improve its correctness? - #103 by gdalle, but hopefully someone will do it justice eventually.

11 Likes

It didn’t cross my mind to link the begging for contributions to being able to assess that I can help.

I am not sure to what extent others feel the same, but when it comes to evaluating what one can do, I am pretty confident when doing that evaluation for the projects I am working on as opposed to contributing to Julia-repo.

I know that there is always the make a PR, and let us tell you if it is garbage solution - but that feels more like fighting my way in approach. The same “…we’re volunteers too” applies here - and both the following were valid for a while now:

  • I really love Julia and I want it to thrive
  • I have (can make) time to contribute

In fact, I wanted to contribute in some way and after this topic cooled down I started to be more active here, on discourse.

I know that general true statements about what contributing to open-source projects means can be invoked and applied to Julia (and it is not like I am ignorant about those rules): however, I think we can at least entertain the idea that in a less technical way than, now mitigated, TTFX, Julia might have a time-to-first-PR issue (and is less relevant that other open-source projects might have this issue as well).

It was a clear TTFPR issue when I reported the following bug after I actually delved into Julia’s source code, detected the exact issue, and proposed the fix.

So why didn’t I PR the thing and only open an issue? Good question: I cannot pinpoint a single reason.

Maybe it was about a mix of various factors. But my honest answer is that I wanted to contribute, I had the time (and actually I had the local fix because the bug was really annoying) and still no PR.

If my case is singular and this I want + I can + I don’t mix is not a real issue that goes past my sole experience, then I think we can just let this go: I learned my lesson.

However, if there are things that can be done in the Julia community to reduce yet another TTF- something, then let’s use this opportunity and maybe lower the perceived bar for Julia-repo contributions.

4 Likes

I know that many people don’t contribute who perhaps should, but also that many who do submit PRs can have a frustrating experience waiting for reviews. I don’t know how to fix that. The fundamental tradeoff is that time spent reviewing PRs (and it takes a lot of time) means time not spent on other activities like fixing really hard problems. So we persist in an awkward balance where good stuff gets done but perhaps more slowly than ideal.

Anyway, this is a bit off-topic from this main thread, I’ll stop now. If you have ideas about how to fix it perhaps another topic?

7 Likes

I agree we should stop here for now. I’ll think about this.

Also - thank you for the hard work you put into the Julia ecosystem. Sometimes it is hard to point out some issues without sounding ungrateful/demanding: so again, thank you!

5 Likes

So here is a many page discussion about the reverse-zip bug I was about to fix, but when I saw from the lack of enthusiasm that the fix would mean having a PR lingering around for years again I left it be Iterators.reverse gives unexpected results when zipping iterators of different lengths · Issue #25583 · JuliaLang/julia · GitHub (I had already with #24978 a PR that took 3 years and with #29927 one that took 2 years)

1 Like

If you’re interested in discussing this specifically I’d recommend starting a new topic or reading over some of the previous discussions, but yes this specific example is probably the poster child of the hype and shiny features over correctness culture being decried.

However (and note this is coming from someone who has complained at length about the aforementioned issues), it does feel like one of the more extreme examples and not representative of the ecosystem as a whole. Case in point, ForwardDiff.jl has been quite solid for years now in the AD space. The reason this is a big deal despite only involving 2-3 libraries comes down to a) interest in ML as shown by anecdata and successive Julia Community Surveys, b) some of the packages being promoted as flagship libraries in the past such that everyone and their dog knows about them, and c) excessive hype about the capabilities of this specific corner of the ecosystem during early development which didn’t pan out for a variety of reasons.

I think you’re right to be skeptical about number of issues being a good metric here. Suffice it to say Zygote has/had a greater number of significant issues and issues in commonly-used code paths than the Python ADs being compared against, even if the absolute or relative number of issues is similar. It does seem like some lessons were learned from this, but unfortunately not before the Julia ecosystem experienced a reverse-mode AD “winter” which has only recently begun to thaw (again, a discussion for another thread).

5 Likes

It seems this thread has come to an awkward end. Can I take credit for that?

3 Likes

See What’s the aliasing story in Julia

4 Likes