Did Julia community do something to improve its correctness?

Yeah, my only point was that I think that the relevance of such list of issues for everyday programming is overrated.

And, for the records:

In [1]: import numpy as np

In [2]: a = np.random.random(10_000_000)

In [3]: np.sum(a.reshape((-1, 1)), axis=1, out=a)
Out[3]: 
array([0.83372408, 0.09255398, 0.34395135, ..., 0.51044679, 0.64471536,
       0.46825184])

In [4]: %timeit np.sum(a.reshape((-1, 1)), axis=1, out=a)
37.6 ms ± 522 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
julia> @btime sum!(a,a;init=false) setup=(a=rand(10_000_000)) evals=1;
  7.344 ms (0 allocations: 0 bytes)
1 Like

It is not clear to me that init was ever meant to be part of the public API. That said, I think documenting it might help people understand what it does.

My guess is that init was added to solve head off another correct issue early on.

julia> A = Vector{Int}(undef, 3)
3-element Vector{Int64}:
 140320323899536
 140320345387120
               0

julia> sum!(A, [1 2; 3 4; 5 6]; init=false)
3-element Vector{Int64}:
 140320323899539
 140320345387127
              11

julia> A = Vector{Int}(undef, 3)
3-element Vector{Int64}:
 5929362882510025827
 3544382716706496582
      45226851069486

julia> sum!(A, [1 2; 3 4; 5 6])
3-element Vector{Int64}:
  3
  7
 11
2 Likes

The thread split messed things up, but I was responding to a comment which suggested Numpy wasn’t working as expected but using not using an equivalent bit of code for comparison (i.e. not using the in-place sum function).

But yes, I agree comparisons are only useful insofar as they can correct mistaken assumptions such as “there’s no way to do this safely”. Thankfully sum! at least provides a warning in its docstring, so thus I don’t think it’s the best representative for the broader topic on hand.

To the broader topic, I think what @mschauer touches on about bugs which can show up using only functionality from stdlibs (no “distributed SVD of a BlockedArray filled with Unitful Quaternions”) and the difficulties in getting them fixed is worth exploring. Using Did Julia community do something to improve its correctness? - #121 by mschauer as an example, one can see a number of barriers:

  • Designing proposed fixes often requires a back-and-forth from subject matter experts and/or those with the commit bit, but can often be left hanging if someone gets busy or moves on without a good way to revive them.
  • Even when a PR is made, the same can still happen.
  • There is a nominally a process, but it’s not consistently applied. Call to action labels like “merge me” or “forget me not” can stick around for months/years. A PR like `reverse(zip(its...))` now checks lengths of constituent iterators for equality by adienes · Pull Request #50435 · JuliaLang/julia · GitHub can have approval, green CI and (based on Slack chat) have gone through triage, but still have labels like “triage” attached and be unmerged.

There has been plenty of ink spilled on how to solve this, but just to list a couple ideas which resonated with me:

  • Defining a clear and consistent process for how bug, robustness and maybe docs fixes are triaged and reviewed. e.g. all bugfixes are put into a priority queue based on severity and age, and a certain amount of each triage meeting is set out to go over them.
  • Defining and adhering to a guideline for when PRs are complete so they can be landed. Whether that’s some combination of approval, green CI, the merge me label, etc, there should be some way to start this process and some sense of urgency to merge once a PR meets the criteria. I’m sure certain parts of Julialang/Julia will want to be excluded from this (the compiler comes to mind), but enabling it for stdlib bugfixes shouldn’t be that disruptive.

But ultimately, how something like this gets done is less important than declaring it will be done and seeing actions being taken towards that end. My impression from the previous discussion threads on robustness is that a number of ideas are thrown around but only the most incremental (if any) changes are implemented. Given that, it’s kind of hard to blame people for bringing the topic up again and again, since what can I or anyone else point at to say “look, it’s being worked on”?

6 Likes

I’m imagining a repo that does a lot of correctness tests independent from the implementations. Things like stats fitting, sampling from distributions, summing things matrix arithmetic, etc etc. Perhaps implementing some tests from NIST suites and such. Then it generates a report, clarifying what common mathematical operations work correctly and what ones don’t.

This test suite could be iterated on much quicker than the fixes, and could be used in the bug fixing process to help approve code.

One methodology might be to calculate various things in both Julia and R and compare them for example.

1 Like

You are misunderstanding thr purpose of the tag. It does not indicate planning or priority. It indicates what would break the release.

The only way to really advance the issue is work on the solution. I think this likely the closest for some resolution: https://github.com/JuliaLang/julia/pull/50824

1 Like

I appreciate your point that not all packages have the same level of correctness, but I don’t think it’s fair to imply that bugs only occur in obscure or unused packages. For example, the JuliaStats ecosystem, which is widely used and respected, has many correctness issues that affect me and other users. I often encounter problems with @inbounds, type conversions, and out-of-bounds in packages like Distributions.jl and StatsBase.jl. These issues often keep me from using Julia for serious statistical computing, even though I love Turing.jl and many other parts of the Julia ecosystem.

These problems aren’t always related to Julia offering a greater degree of composability either. I often find it easier to compose packages in Python and R than I do in Julia, because they have more standardized and documented interfaces for different tasks. For instance, if I want to use labelled arrays, autodiff, differential equations, or tensor algebra in Python, I know there is a widely adopted package for each one of these purposes (XArray, JAX, Equinox, and PyTensor), all of which have been tested for compatibility with PyMC. In Julia, there are many different implementations of these functionalities, which makes it hard to ensure compatibility and correctness.

This is all even before we start talking about the autodiff ecosystem, which is often a major barrier to making any serious progress in Julia. I appreciate the progress that Enzyme has made here over the past year, but I still have some concerns about whether its current problems are just growing pains, or if they just reflect the difficulty of trying to write autodiff in assembly. After being burned by several autodiff systems with major correctness bugs, there’s a real feeling in the community that we can’t fully trust anything until we’ve tested it in our own code.

5 Likes

Well, don’t take it the wrong way, but what exactly are you expecting to gain here when all your needs are met by Python and R?

8 Likes

Syntactic macros, performance, cleaner mathematical syntax, better abstractions (Distributions.jl has an amazing high-level design, despite all of its correctness issues), functional/procedural programming instead of OOP, not having to learn/work in C++…

There’s lots of great things about Julia! But unfortunately, a well-tested and thoroughly-debugged ecosystem isn’t one of them.

6 Likes

This is an unhelpful overgeneralization, IMO. In my six years with Julia I think I reported a couple of minor bugs in the standard library, and about the same number elsewhere…

13 Likes

Do those have open issues we can track?

I personally am talking about the ecosystem (packages outside the stdlib), not the standard library (which I’d agree with you on).

1 Like

Yep, and have since Yuri opened them a few years ago.

@ParadaCarleton you know the drill here, complaints need links

(you must have a doc with thes listed somewhere by now :joy:)

7 Likes

Do you mean those tagged as bugs, the youngest of which is 4y old?

e.g. the very recent

2 Likes

That‘s certainly a good example of an issue around @inbounds, but I was asking about the specific ones in Distributions.jl or StatsBase.jl.

4 Likes

Please at least try a fair read of my comments. I obviously am not implying bugs only occur in obscure or unused packages. I remain one of the biggest critics of JuliaStats for a multitude of reasons. My point is precisely that I know JuliaStats has these issues, but that doesn’t mean other ecosystems have these issues. People who are pointing to this correctness problem keep pointing to examples in JuliaStats. Yes, I agree JuliaStats has many issues, but “Julia” is not just “JuliaStats”. Equating the two is a huge huge huge generalization of the Julia community. So it’s funny that the example that you point out is exactly this same example.

9 Likes

The larger point of “has the community overall done something to address this (in a systemic manner)?” stands though - it’s cool if SciML or other subcommunities fix this “on their turf”, but there still isn’t any larger drive to prevent this systemically from occurring in the first place.

It’s always super frustrating to see these calls for “has the ecosystem at large addressed this?” be met with “well MY part of the ecosystem doesn’t have these problems!” - that’s not what was asked :person_shrugging:

7 Likes

There are plans for interfaces that are being tested. A lot of ecosystems are locking down the interfaces more and more.

1 Like

I mean, I did.

It’s not just JuliaStats. Zygote is infamous for this. If what you really want is an example from SciML, QMC.jl was riddled with correctness issues before I essentially rewrote it from scratch last year.