Did Julia community do something to improve its correctness?

jonathanBieler · October 26, 2023, 8:14pm

Well there’s General, we could add test/coverage/review requirements. That would be a systematic way to improve the ecosystem correctness.

Palli · October 26, 2023, 8:50pm

No, that’s unworkable, ill-advised, and has been considered. I think you must mean the General registry, and does ANY open source ecosystem demand this? I’ll first quote, then explain, argue for and against:

Registered packages MUST have an Open Source Initiative approved license, clearly marked via the license file (see below for definition) in the package repository. [more similar]

SHOULD be mentioned in the README file.
[…]

Registry maintenance

The General registry is a shared resource that belongs to the entire Julia community. Therefore, we welcome comments and suggestions from everyone in the Julia community. […]

Disclaimer

The General registry is open for everyone to register packages in. The General registry is not a curated list of Julia packages. In particular this means that:

packages included in the General registry are not reviewed/scrutinized;

packages included in the General registry are not “official” packages and not endorsed/approved by the JuliaLang organization;

the General registry and its maintainers are not responsible for the package code you install through the General registry – you are responsible for reviewing your code dependencies.

The rules seem very good as is. They are not “reviewed/scrutinized” since who is going to do it? You can do it, everyone can drop in with suggestions on packages after registration, help with adding test, documentation. Anyone is free to their opinion can “review” give their advice, but none can demand it followed, or block others from registration until such advice is followed for good reasons:

It feels like slavery. You are expecting unpaid work. Open source software is a gift to the community, that you can take or leave, and that includes Julia itself. You can however offer to help.
If you make demands, even one, conditions to registration, then we get fewer registrations! Do we really want that?
We could however have a star system (hmm, GitHub already does) or some kind of encouragement. Financial is not ruled out. I believe PyPi has systematic labels, alpha, beta,… mature. It might be a good idea, is it self-reported there?
Nothing rules out having a curated list of packages, and such exists already. Anyone can make their own, and anyone can make a local registry. Plausibly anyone could make a less-general registry with any requirement they want, i.e. curation. It’s been tried before when there was JuliaPro. Yes, it was proprietary (understandably, who should do curation for free), then it was I believe free or at least no-cost, then dropped.

But are there pros to requirements?

You would end up with better ecosystem. Maybe, or just smaller.
We already have duplicated packages, i.e. similar or same, for same concepts. Imagine if people were blocked from registering, we would have more such. Ok that feels like a con… So more unknown, non-public, that may or may not end up being registered. But we are keeping out those bad packages for the few curated reviewed ones. We already have two packages for object systems, since the author of the latter didn’t know of the former, which was registered. I’m not sure when work on both began, or this could have been prevented (they also have different types of OOP, so maybe ok). While I’m not proofing much regarding registered packages, imagine if more packages were developed in secreted, since not allowed to register.
Rick Hickey is against SemVer (which seemed like a good idea at the time, never break without telling with an updated version number), he wants packages to never break, explains how and why. making SemVer redundant. He believe when you develop in private you can break, and should. It’s up to you when you think you’re ready and decide to publish your software. But for open source, after that, if you want to respect your users, you can’t break ever. Is there a middle ground? You could register but label somehow, as WIP, alpha, etc. or 0.x which I believe is for exactly that. I’m not sure if he believes in only releasing 1.x, and never update that 1, or 0.x and staying at 0. something. I think he just doesn’t care too much about those version numbers.
He’s actually also against tests, or at least doesn’t consider them to helpful… Rather do as he says, not place-oriented programming (i.e. immutable) and get rid of bad software that way and breaking changes.

We could add JET.jl and

or whatever tools to the General README, as hints on what you can do to help with development/correctness, and advice that they be used prior to registration and after. We demand that package are not empty, or near-empty, i.e. helpful to someone, even yourself. A very low burden. I think we can help with offering tools, such as CI, suggest them used, but I’m conflicted on requiring even one test. Some do not believe in tests. Those that do can ignore packages with no or few tests. Nobody is forcing you to use a package. If people are ignorant of test-driven development, or simply against, then documentation of it is ok, to help people learn and encourage it used.

People may be beginners when making their first package, and people hopefully learn. Over time packages can get better, also with help of others. Should there be requirements going forward, or to upgrade packages? I think the same applies. We currently want people to signal breakage with SemVer, at least be aware people are breaking packages, hope they do not, or if signal it. But is it a requirement? I hope you’re not hiding breaking changes from people intentionally. Even forcing people to upgrade (or not) the major number seems too much to ask, only something we can do politely.

ParadaCarleton · October 27, 2023, 2:35am

I know Conda does, and there may be other examples.

gdalle · October 27, 2023, 6:50am

As of today, both are offered as options by PkgTemplates.jl on the master branch, and soon in the next release I’ll draft a PR to suggest them in General

EDIT: here’s the PR

mihalybaci · October 27, 2023, 12:15pm

I was curious about this so I looked into it. It’s important to be specific here since the whole Python package system is very confusing (at least to me).

Conda is an open-source package manager, not a package repo. Some Conda repos are “are built, reviewed and maintained by Anaconda®.” So, the repos may be open-source, but the heavy lifting is done by a for-profit company by people who (presumably) get paid for it.

Conda-forge is an open-source, community-led repo where “[the] review team will assist you by pointing out improvements and answering questions.” On the conda-forge mainpage there is a list of supporters, both financial and for infrastructure. So, the “community-led” repo is still backed with money and people.

The point has been made before here, but its worth reiterating. People can only volunteer so much of their free time to assist open source projects. Reviewing and assisting every new package and author takes time and money.

Also, I could not find anywhere the actual guidelines for publishing to the repos. So while they are “reviewed”, I have no idea what criteria are used, and without that I don’t think it can be claimed that a random package from a conda repo is better than a random package from the General registry.

Edit: Imagine if the General Registry reviewed and blocked packages according to some unpublished rules. People would be pretty steamed.

stucash · October 27, 2023, 1:15pm

I believe the original blog post by Yuri has made a good attempt at explaining what “correctness issue” mean in the context of our discussion (and given there were 3 different discussions already on the internet either in or out of Julia community, one could gain a good idea of what “correctness” in question is by reading through the discussions on Reddit and Here).

But yes, I definitely agree with a bit more documentation.

stucash · October 27, 2023, 1:40pm

Yuri has made substantial contribution the Julia Language and his point on “mathematically correct” was, to the least, mostly valid; the post is here, feel free to give it go. Another community member later in this discussion has confirmed (just a while back) that there were 25 issues with “correctness”; Yuri alone has raised quite a few.

The possible middle ground here, is that it’s almost 2 years on since Yuri’s post and Julia certainly has progressed; secondly Yuri has raised some edge cases for mathematical correctness which for the majority of us, are just ok to live with.

adienes · October 27, 2023, 1:45pm

there have unfortunately also been a couple bugs—even recently—in lowering, and these kinds of bugs (to me) are much more painful than the map! variety of bugs. when something like an if-else block or the scope of variables is not behaving as expected, it makes writing code feel like tapdancing on a tightrope

stucash · October 27, 2023, 1:47pm

Great point; we can start with whether array aliasing is something we’d allow in sum! and likely the conclusion of this could warrant us some warning in the documentation either way.

mbaz · October 27, 2023, 1:58pm

Could you provide links to the bug reports on GitHub?

Palli · October 27, 2023, 2:00pm

[I was also curious to look that up, but since already done, I just took a quick look.]

I can’t confirm what Conda does, except select, i.e. curate packages. Something I explained is highly possible externally, or some flag could be added to General that states approved package.

I.e. all go in of varying quality, and some are better and marked (but not denied registration, just the mark, until approved). What could be that mark? One condition (or only condition) could be at SemVer 1.x or higher (we could have community-maintained labels of packages; if Chris, either one, has used and approves of the package, I would too), Anyone can actually just choose to NOT install 0.x packages, already; maybe the Pkg should warn, or refuse, for such packages? I suppose it can’t now, I don’t think it would though be a breaking change. It could warn, and we could consider denying later as default, and then people could configure either way.

It’s a question, should a package then at 1.x, but depending on 0.x be denied too, or warned about? It could be argued that the maker of that package has preapproved its dependencies (or not…). It could be a setting to be even more strict and deny recursively.

I see:

Over 250 packages are automatically installed with Anaconda.

Over 7,500 additional open-source packages (including R) can be individually installed from the Anaconda repository with the conda install command.

So the curated list of packages is pretty small, or should I say 250 is tiny? 7750 is also not to large, smaller than 10,000+ of Julia. Those are not comparable numbers for Python, if you should exclude R packages.

I think it also has a much larger list, but even that complete list in conda, is much smaller than in PyPi. Now the question is, are all the packages in conda good, since preselected? I know people object to conda since they are missing some (presumably) good packages you can install with pip.

[I’m not sure I think pip installs from PyPi only (contains all registered Python packages; and only, i.e. no R), conda includes mostly a subset of Python, also some R, not much else, though a few Julia). I and others thought Conda.jl can also install from conda, but FYI, I’m told it can also pip install, so PythonCall.jl has access to all Python packages I believe.]

adienes · October 27, 2023, 2:21pm

github.com/JuliaLang/julia

Make local scope for `else` blocks in `try`/`catch`/`else`

JuliaLang:master ← Pangoraw:else_local_scope

opened 07:54PM - 19 Oct 23 UTC

Pangoraw

+17 -1

[Docs](https://docs.julialang.org/en/v1/manual/control-flow/#else-Clauses) state…: > The try, catch, else, and finally clauses each introduce their own > scope blocks. But it is currently not the case for `else` blocks ```julia julia> try catch else z = 1 end 1 julia> z 1 ``` This change actually makes `else` blocks have their own scope block: ```julia julia> try catch else z = 1 end 1 julia> z ERROR: UndefVarError: `z` not defined ```

github.com/JuliaLang/julia

bug in closure capture boxing logic

opened 05:16AM - 04 Mar 23 UTC

closed 08:43PM - 09 Aug 23 UTC

uniment

bug lowering

Notice that `foo()` returns `3`. ```julia julia> VERSION v"1.9.0-beta4" ju…lia> foo=let j=0, f, i while j < 3 i = j + 1 println("i == ", i) if j==0; f = ()->i end j += 1 end i f end; i == 1 i == 2 i == 3 julia> foo() 3 ``` Comment out the line that says `i`, and `foo()` instead returns `1` (and its capture is not in a `Core.Box` anymore). (ref: [Discourse thread](https://discourse.julialang.org/t/rfc-some-ideas-to-tackle-15276-performance-of-captured-variables-in-closures/95260/42?u=uniment))

github.com/JuliaLang/julia

Destructuring syntax does not respect const annotation

opened 08:53PM - 14 Oct 22 UTC

closed 02:04PM - 24 Oct 22 UTC

zengmao

bug lowering

The const keyword seems to have no effect when used with the destructuring synta…x in Julia 1.7+. ``` julia> t = (;a = 1, b = 2); julia> const (;a, b) = t (a = 1, b = 2) julia> a = 3 # re-definition produces no error or warning, so a is not const 3 ``` My Julia version is 1.8.2, 64-bit Linux (glibc), installed from the binary tarball. (First posted on Discourse: [https://discourse.julialang.org/t/destructuring-syntax-does-not-respect-const-annotation/88751](https://discourse.julialang.org/t/destructuring-syntax-does-not-respect-const-annotation/88751))

github.com/JuliaLang/julia

Using keyword argument prevents specialization

opened 05:15AM - 03 May 22 UTC

closed 02:55PM - 09 May 22 UTC

moble

bug performance lowering

I raised this issue [on discourse](https://discourse.julialang.org/t/using-a-key…word-argument-leads-to-enormous-allocations/80354), where the consensus seems to be that this is at least very confusing, and possibly a bug. If I *use* a kwarg for a function that was passed as an argument to another function, Julia does not specialize the latter function. If I don't use the kwarg for that very same function, it does specialize. Below, I'll include a less trivial example that's closer to my actual use case. But schematically, the idea is this: ```julia f1(a; b=10) = a+b f2(c, f) = c + f(20) f3(d, f) = d + f(30; b=40) ``` Calling, for example, `f2(5, f1)` will result in a specialized `f2`; calling `f3(5, f1)` will not result in specialization. I imagine Julia is clever enough to optimize the problem away in this schematic. But in my code, I was seeing slowdowns of ~100x and allocations of multiple GiBs on each call to my core computation function. As pointed out in discourse, it's possible to manually trigger specialization by adding a type parameter. But the failure to *automatically* specialize is a problem for a few reasons: 1. It's surprising. The [performance tip on specialization](https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing) says "Julia will always specialize when the argument is used within the method, but not if the argument is just passed through to another function." In this case, I *did* use the argument (the function with the kwarg). From the discussion on discourse, it looks like the problem is that Julia immediately lowers that to just pass the function through to `Core.kwfunc`. So technically the argument "is just passed through to another function" — but not by the programmer. (Gotta love passive voice!) 2. It's very hard to diagnose. None of the usual tools — profiling, allocation tracking, `@code_warntype`, JET, Traceur — pointed out any problem with the use of kwargs. In fact, profiling and allocation actively focused my attention on other parts of the code that were not at all the source of the problem. Even the `(@which f(...)).specializations` trick from that section of the performance tips seemed to say the function *was* being specialized for my arguments. (See below.) 3. It seems to contradict the docs. If the goal when designing this heuristic is to detect when a function is "just passed through" so that it will "usually [have] no performance impact at runtime", surely the decision of how to arrange parameters in a function definition should not affect the result. So, at the very least, I would think this is a documentation bug, because the kwarg wrinkle should be noted in that performance tip — rather than requiring the user to mentally combine disparate arcana from the most cryptic parts of the docs. It would also be nice if some standard tools could point toward the source of the problem. But maybe this is truly a bug in Julia, which should actually specialize even when a kwarg is used? --- For reference, here's a working example that's complicated enough that Julia doesn't just optimize the problem away, while still being a greatly simplified version of my actual use case: ```julia using Profile function index(n, mp, m; n_max=n) n + mp + m + n_max end function inplace!(a, n_max, index_func) i1 = index_func(1, 2, 3; n_max=n_max) # Using this version leads to allocations below # i1 = index_func(1, 2, 3) # Using this version leads to 0 allocations i2 = size(a, 1) - 2i1 for i in 1:i2 # Allocates 3182688 B if using kwarg above a[i + i1] = a[i + i1 - 1] # Allocates 9573120 B if using kwarg above end for i in 3:i2-4 # Allocates 3182576 B if using kwarg above a[i + i1] -= a[i + i1 - 2] # Allocates 12771408 B if using kwarg above end end function compute_a(n_max::Int64) a = randn(Float64, 100_000) inplace!(a, n_max, index) Profile.clear_malloc_data() inplace!(a, n_max, index) end compute_a(10) ``` And yes, there are plenty of ways to improve the performance of this simplified code with function barriers and such. But my actual code is too complicated for that, with the kwarg func being used multiple times inside some loops. If I look at the specializations of `inplace!(a, n_max, index)`, I get ```julia svec(MethodInstance for inplace!(::Vector{Float64}, ::Int64, ::Function), MethodInstance for inplace!(::Vector{Float64}, ::Int64, ::typeof(index)), nothing, nothing, nothing, nothing, nothing, nothing) ``` That second element really looks to me like it specialized for my particular `index` function. <details> <summary>Here's all my versioninfo</summary> ``` julia> versioninfo() Julia Version 1.7.2 Commit bf53498635 (2022-02-06 15:21 UTC) Platform Info: OS: macOS (x86_64-apple-darwin19.5.0) CPU: Intel(R) Core(TM) i7-4770HQ CPU @ 2.20GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-12.0.1 (ORCJIT, haswell) ``` </details>

nsajko · October 27, 2023, 2:28pm

Not sure if you’re just trolling anyway, but “mathematically correct” doesn’t appear on that page.

adienes · October 27, 2023, 2:29pm

no, but “incorrect mathematical” does

Mason · October 27, 2023, 2:43pm

Just an FYI, this does happen. Usually the package author ends up agreeing that it shouldn’t be published to the general registry after people independently go in and take a look at a package, but not always.

Ultimately, the general registry has very few rules and is very permissive, but it’s also not a garbage dump, and people do put some level of effort into gatekeeping packages they think are useless even if they technically meet the official guidelines.

nsajko · October 27, 2023, 3:04pm

True, but there’s no discussion of mathematical correctness nowhere near, the paragraph is about @inbounds misuse.

I definitely agree that a somewhat careless attitude is/was present in some parts of the ecosystem regarding using dangerous features like @inbounds, @pure, etc.; and the devs can definitely be blamed for this somewhat, seeing as they could have, e.g., included unsafe as a prefix to these names.

Also, I’m not the kind of Julia fan that thinks the Julia devs or community can do no wrong, for example I was very recently disturbed by some instance of rude behavior directed by some Julia devs towards another on public Github; and I have criticized Julia publicly myself.

However this whole specific thing of attacking Julia (either the language or community or both, usually it’s unspecified which one) is IMO weird, way overblown and mostly based on vague and misleading claims and loaded language. Someone seems to be pushing and mysticizing the “correctness bug” phrase to the degree that it now seems to be largely associated with the Julia language (searching “correctness bugs” on Google yields Julia-related results on the front-page, even when I try with “Incognito mode”, FWIW). I find the singling out of Julia here to be absurd and suspect. I’ve seen many HN comments associating “correctness bugs” with Julia, and I notice the comments are usually dumb, e.g. they don’t even specify what they mean by “correctness bug”, even though the term is obviously neither well-defined nor common. Also, they always point out the same, but comparatively benign bugs, indicating a possible lack of familiarity with Julia. I suppose there was some sort of troll/sockpuppet/keyboard army campaign of discrediting Julia, for whatever reason.

adienes · October 27, 2023, 3:13pm

how about the 5 I linked above then. wouldn’t call them all benign

mihalybaci · October 27, 2023, 3:19pm

I think “edge cases” is important because they’re only going to show when someone finally touches that edge. I gave the article another look (I read it when it first came out) and found that almost all of the bugs reported in his initial two lists where closed and merged (yay!). Though I didn’t check them all by any stretch.

Just for comparison, NumPy has nearly 2000 open issues, 634 of which are labeled “bug”. Granted, numpy is a huge project and one would expect more issues than a small Julia package. But it still has 10 full pages of bugs older than Julia v1.0 (~Aug 2018). SciPy has 440 “defect” bugs. Do NumPy and SciPy need to do something to “improve their correctness”? No, and I have never personally encountered a bug in NumPy or SciPy.

I don’t mean to pile on, but much like @nsajko I get the sense the folks are losing the forest for the trees.

mbaz · October 27, 2023, 3:36pm

Thanks for the links.

47410 was fixed almost a year ago, and requires quite specific (and I’d say uncommon) syntax to trigger.

51785 has been merged; this should definitely been caught by testing.

48889 has been fixed; should have been caught by testing.

47168 was fixed over a year ago; should have been caught by testing.

45162 was fixed over a year ago; it does not involve any incorrect results.

I’d prefer not to see these bugs occur, but hardly make me feel like I’m dancing on a tightrope.

If anything, I’d say that what is missing is enough tests to cover all uses of new syntax as it is introduced. This is made especially difficult by the flexibility of the language.

adienes · October 27, 2023, 3:38pm

yes all of those have been fixed now because they are the ones that were caught. I cannot provide examples of bugs that I don’t know about yet. but the point is that I think every one of these should have been caught by testing

does not exactly inspire confidence as to the robustness of lowering logic. in fact I have encountered issues myself I’m pretty sure are bugs which involve putting too much logic inside default function arguments. next time I can recreate it will file another issue

Topic		Replies	Views
Julia 1.11.1 gives different results from 1.10.5 General Usage	56	2172	December 7, 2024
Correctness and Multiple Dispatch (Help explain to a julia noob) Community question	13	1038	April 30, 2025
Correctness bugs Offtopic gripes , griping	8	1017	November 29, 2024
Julia v0.6 to v1.10 appreciation post Community	3	679	October 7, 2024
Discussion on "Why I no longer recommend Julia" by Yuri Vishnevsky Community discussion	298	46896	September 9, 2022

Related topics