Did Julia community do something to improve its correctness?

I hope I can use it for control system simulation and deep learning. I want to introduce Julia to my students, but I heard many things about its correctness. I learned Julia but never use it to do any real job. I hope the community has began to solve them.
How it is going with the correctness improvement? Does these include recoding some important library again by professional programmer instead of scholars. And if there is no measures to improve it, do es that mean the idea about using other scholar’s code will eventually fail?
Thanks.

8 Likes

I haven’t seen much change on correctness. There are a few developments that might be setting the groundwork for improvements, however.

  • Julia 1.10 will use a Julia-based parser which will make syntax tooling easier.
  • Julia 1.11 will have a way to declare public API which will help alert developers to what is expected, and help develop tooling to prevent out-of-API use.
  • JET.jl is developing rapidly.
  • @Sukera and @Keno have been developing interface specification tools like GitHub - Seelengrab/PropCheck.jl: A package for simple property based testing in julia. which should help if they are adopted but those tools are very new and not yet adopted.

Again though, those only create the opportunity for investing in correctness. Whether the devs and community will actually take that opportunity remains an open question.

10 Likes

There is hardly any problem with the correctness of the solution of numerical problems with Julia. A problem that was discussed a lot, even though it has little relevance in practice was the use of zero based arrays with libraries that did not test this use case.

This issue is mitigated with warnings for code that iterates over 1:length(vector) by the linter, and many package authors updated their libraries.

In the end you will always have bugs in any program and you must always verify important results with other means, for example text book examples or code written in other languages.

So there is no Julia specific correctness problem other than that you have to be careful which packages you use and combine. You can get cutting edge algorithms as Julia packages. Before using them for serious work just ask in this forum which package is the best choice for a given problem.

I am using Julia and ModelingToolkit.jl for contol system development and the only bugs that the compiler did not find were my own bugs…

33 Likes

That is pretty unspecific, and thus is difficult to answer specifically.

As for reliability of Julia for numeric calculations: It is used for some quite demanding and probably mission critical tasks by a number of big players.

8 Likes

Let me answer the correctness question first. Correctness requires testing and analysis. Besides built in testing facilities in the standard library there are also a number of important tools:

  1. GitHub - aviatesk/JET.jl: An experimental code analyzer for Julia. No need for additional type annotations. for static analysis
  2. GitHub - JuliaTesting/Aqua.jl: Auto QUality Assurance for Julia packages for package quality evaluation

Some misconstrue correctness to mean a language will prevent you or otherwise strongly warn you from doing something you should not be doing or that may result in undefined behavior. For example, some static languages will refuse to compile if they can detect such conditions. In Julia, the static analysis and execution steps are decoupled since Julia is a dynamic language. If you want to do this analysis, it must be done explicitly. Ultimately, if you want to ensure correctness the only real path is through extensive testing. Julia provides the tools to do so, but you have to use them.

The correctness issue is also relative. My sense is that well written Julia code has a greater chance of being correct than say the equivalent well written Python code. Julia has a built-in notion of types, including abstract and parametric types. If used well, this can greatly enhance the quality and correctness of Julia code. Type annotations in Python are a relati vely new concept and require non-standard tools to utilize. Versus statically compiled languages, we are at a potential disadvantage since we do not have a mandatory step to force staric analysis, but it still can be done as an optional step as described above. That said we also have the advantage of being relatively modern versus a languages like C and C++ in that we some known unsafe behaviors are clearly indicated.

Where some Julia correctness issues have arisen is when abstract interfaces were either not well defined or understood. The classic example is for AbstractArray. This is a highly abstract interface which provides a lot of flexibility. You may have heard Julia is a one-based language in that indexing of the Array type starts at 1. This is not necessarily true for an AbstractArray. Some methods, sometimes from old code, have made some incorrect assumptions such as one-based indexing resulting in errors. To compensate, Julia has some nice tools to support arbitrary indexing such as the begin keyword or the first or eachindex methods. My recommendation is for Julia programmers to start conservatively with concrete types such as the one-based Array rather than trying overgeneralize too soon. Using abstract types requires significantly more testing than using concrete types.

In summary for correctness, Julia has the facilities to check for correctness, but you have to use them.

A significant part of Julia is being developed by professional developers. The companies JuliaHub, RelationalAI, and PumasAI have full time professional developers working on core Julia and packages. A number of companies are also applying Julia internally.

At the end of the day, what we are really talking about are software bugs. Julia has a wide variety of facilities to employ engineering controls to identify and correct them, but they must be used. Some facilities such as the enforcement of abstract interfaces are still being developed, but they do exist and are being thought about.

A more abstract concept is that much of Julia code is written in Julia itself, meaning that Julia developers have the ability to catch and evaluate issues and bugs directly.

25 Likes

How is it done? Something like C++'s “public” and “private” keywords?

Looks like this open PR:

https://github.com/JuliaLang/julia/pull/50105

3 Likes

This has not been my experience. Sometimes quite fundamental things (in Base, not referring to packages) are broken in ways which makes me really scratch my head.

That being said, I share @jar1 's optimism. It seems like—especially very recently—there has been more attention towards defining and enforcing interfaces. And many of the issues/PRs tagged for 1.10 have been various bugs & regressions, some of which quite long-standing, which gives me the feeling that it will be a particularly “clean” release.

3 Likes

There’s also a gulf between correctness of Julia, and that of packages. The former receives a lot of attention, whereas the latter is what users often interact with. Fixing the latter often requires active development, which may not always be the case.

8 Likes

Which packages? Some high schooler’s homework project or DataFrames? There’s >7000 packages. Making a statement about “packages” makes no sense. The 7000+ packages are not similar. Please say what packages you’re talking about and what parts of them if you want to make any real statement. Otherwise my response is, I don’t think there’s any correctness issue in MuladdMacro.jl and I have never seen a single person open up an issue about its correctness.

31 Likes

Well, of course. I’m not singling your package out. Yuri’s post at that time had talked about Distributions, StatsBase and OrderedCollections from what I recall. Those specific issues from the post have been addressed now. I’ve recently fixed a few issues with FillArrays. I’m guessing SciML packages would be of a higher standard because of a larger community involvement

5 Likes

I think it’s just a general thing that anytime someone says “packages” they should always specify which ones, because “packages” at this point is not a very helpful term. You can always find a package that’s bad, because there’s some packages that should basically be deleted. That’s true on pypi as well. So my point is that we shouldn’t say “Julia packages have these problems …”, any statement like that needs to be qualified better, i.e. “I found that these 3 statistics packages could use an improvement” or “these 2 SciML packages need to improve their type stability in …”. Not only is it more actionable, it’s also more correct.

But back to the main topic “did Julia community do something to improve its correctness?”, yes. The biggest thing is that in v1.8 a lot of code is faster by not including @inbounds, and by having the inbounds checks disabled from a lot of the package sphere we get better error checks on what was probably the biggest issue. I wouldn’t say it’s all removed, and it’s easy to check:

https://github.com/search?q=org%3ASciML+inbounds&type=code&p=1

But if you look through the list, most of what’s left is either user-facing benchmark code (i.e. model code), @inbounds for eachindex(u) which has a safety by design, generated code (which proves the correctness by the size of the symbolics), or are used in functions which are explicit Array dispatches used to specialize for performance.

I presume many Julia packages have made similar updates because of the potential performance benefits.

14 Likes

My take is that no, Julia has not substantially addressed the growing concerns about correctness. At least if you define “correctness” as the relative ability to write bug-free code. As Yuri presciently stated in his blog post:

systemic problems like this can rarely be solved from the bottom up, and my sense is that the project leadership does not agree that there is a serious correctness problem.

The problem is quite hard to fix, because Julia is not designed for with correctness as a priority. This means that it’s hard to add language-level fixes in a non-breaking manner, and so the only way this issue can be improved is through a combination of minor changes on the language level, a cultural shift, and improved tooling. The cultural shift may be happening, but I don’t really any impact of it yet.

15 Likes

Its so hard to define what this question really means. How do we measure “correctness” over a whole ecosystem and compare it to another ecosystem?

Do you mean compared to R? or to python? or to Rust or Ada? What are we aiming for here?

One key problem is that other languages avoid the majority of Julias correctness problems by not being composable. You just cant use packages together so you cant expose inconsistencies between them, like array indexing problems Julia has had. If we compare all of the combinatorial possibility of bugs in any language, Julia will loose mostly because the number of possible combinations of code is (much) larger.

We do absolutely have to work out how to do package composability with more correctness. But the base language is increasingly used by e.g. by ASML in machines that make most of the worlds microchips, so probably its relatively reliable.

21 Likes

It is not just packages and it is not just composability. I am sometimes quite surprised by correctness issues directly in Base.

as just a very recent example, take 1.9.2 Base.map!(|,a,a,b) yields wrong answer on BitVector · Issue #50780 · JuliaLang/julia · GitHub (do not mean to pick on anyone in particular here)


julia> a = BitVector([0, 1]);

julia> b = BitVector([1, 0]);

julia> map!(|, a, a, b)
2-element BitVector:
 1
 0

it’s great that this got fixed quickly, but also it is kind of wild to me that it happened in the first place. can you imagine if std::transform in C++ on primitive bitwise operations was just… wrong?

there are plenty more examples like this that have remained open for quite some time

2 Likes

So your comparision is with C++ then, not R or Python :wink:

there are plenty more examples like this that have remained open for quite some time

That bug was fixed within hours of the report… its bad this happened but lets keep things concrete

3 Likes

ok then, just with the same function family map!, sum!, etc.

one of Yuri’s issue which has been open for 2 years:
sum!, prod!, any!, and all! may silently return incorrect results · Issue #39385 · JuliaLang/julia · GitHub;

Undefined behavior exposed to the user open for 3 years, which I had to basically beg to get on the 1.10 milestone and was later removed
`map!` resulting in undefined values · Issue #36235 · JuliaLang/julia · GitHub;

similarly this issue was open for a year before I bumped it to get it on the milestone, where if you indexed with an unsigned int you could just get an undef value
Bug with range() and unsigned indices · Issue #44895 · JuliaLang/julia · GitHub;

and one I personally ran into: Iterators.reverse is very often wrong for Filter and Zip types, yet neither of these issues are labeled bug let alone taken seriously
`reverse(Iterators.filter)` can be incorrect if predicate is impure · Issue #50440 · JuliaLang/julia · GitHub;
Iterators.reverse gives unexpected results when zipping iterators of different lengths · Issue #25583 · JuliaLang/julia · GitHub;

I think significantly more things should be parsing errors, instead there is just strange and error-prone behavior, like
`continue` and `break` should not be allowed on RHS of expression · Issue #50415 · JuliaLang/julia · GitHub;
Infix operator definition syntax needs documentation · Issue #15483 · JuliaLang/julia · GitHub;

and god help you if you try to put too much code inside optional or keyword arguments… it’s very unclear when those get evaluated, in what scope, in what order, etc.

I’m not trying to be too negative, but I must admit I feel frustration every time this discussion comes up because the response is always “pics or it didn’t happen, please give examples,” but there are so many examples! just go to the issue tracker

42 Likes

I’m not aware of growing concerns, but what do you have in mind with “not substantially addressed”? I just went through Yuri’s famous post, every link, and vast majority has been fixed (almost all the issues he raised have been closed, and if so I assume no longer relevant).

Here’s the complete list of exceptions (unless I missed any):

[This one seems alarming, but I’ve never used e.g. sum!, I and I think most would use sum, and not be affected?]

[That’s another aliasing issue, and if you avoid that, then both ok?]

This one is rather obscure, and claimed:

Is this really a bug? File IO is buffered by default, which I believe is a very reasonable choice. We could add a line-buffering option, but I’m not sure that should be the default.

This package is very much used, so I doubt this is a huge issue for most users, is this only about if used with OffsetArrays or similar? I.e. avoidable, and avoided by most users?

This one is still open, but is it already fixed, I see at least:

Yuri still seems to care about Julia since he reported a new bug in April (an obscure one only relevant if you use Float16), and one in 2022, full list of the 4 open bug issues here:

I think it’s too soon to state that (here’s what already merged, and I believe it documents staus quo, and should apply to all older version of Julia):

Here’s what’s proposed, but it’s under discussion, and not yet on 1.11 milestone, and I believe not all agree, and there are counter-proposal(s):

The latter was marked bug, then disagreed with, and thanks for making the PR for this. Though I’m not sure will be accepted.

Jeff likes supporting different lengths and recently merged this one (@elrod should this be reverted before it gets into stable Julia, and the rest of Julia go in the other direction? It’s in 1.10-beta1, and when it will get stable, it will be a breaking change to undo):

since different lengths are not considered bad, thus this closed:

conforming e.g. with Python (and Lisp), but Python discovered in 2020, and I pointed it out there in the end:

PEP 618 – Add Optional Length-Checking To zip | peps.python.org

These bugs are not only difficult to diagnose, but difficult to even detect at all.

Would you like truncation for map, zip (and iterators)? And 2.0 because of it? See:

3 Likes

Correctness can have many different meanings. I am running simulations of dynamic systems, thus numerically solving differential equations. I did that with Simulink, Python, C++ and Julia.

When I use Simulink with my current model none of the variable step size solvers is delivering correct results, most of them seam to work, but the results have an error of 100% or more.
The fixed step solvers work, but even with very small time steps which makes them 1000 times slower than Julia the error is 100 times larger than the error of the Julia solvers.

When linearizing non-trivial dynamic systems with Simulink I get results that are completely wrong, but no warning or error message. ModelingToolkit just works.

Shall I conclude that Matlab/ Simulink should not be used for serious work?

I would not do that. It has advantages and disadvantages, and you must always double check your results and choose a tool that meets your needs.

The main difference between Matlab and Julia with respect to correctness is that Matlab does not have a public issue tracker…

And I agree, I would put sum!, prod!, any!, and all! may silently return incorrect results · Issue #39385 · JuliaLang/julia · GitHub as blocker on the 1.10 milestone…

UPDATE: Was just added to the 1.10 milestone… :slight_smile:

18 Likes

I cannot speak to the correctness of Simulink/MATLAB as I have little experience with either.

I can conclude, though, that ModelingToolkit must not be depending on the correctness of reverse iterators or aliasing map!-like functions!

To be fair to Julia, it is a more comprehensive language than Python (the language, not the ecosystem), and there is a larger surface for error. Also, I think the Python or C++ code I write is indeed more likely to have a bug in it than the Julia code I write.

BUT, of the bugs in my Python or C++ code, the chance that it is an issue with the language itself is usually negligible, and it is almost always user-error. This is something that can be caught via peer review (or just self-review). When the language itself has a bug, reviews are not as effective a safeguard, as both sets of eyes might expect the same (correct) behavior, and it can take a very long time to realize what has actually gone wrong.

6 Likes