Julia stability vs Rust for Scientific Computing

this is an interesting list of examples, thanks. the quantile! and foldr bugs should definitely be fixed! and as a side note, I think Statistics.jl could use more maintainers so fixing quantile! might be an easy first contribution; it looks like a kwarg forwarding typo.

I’ll just note that the invmod bug was already fixed in 1.13+, and some of these don’t really look like bugs in the first place (e.g. the pinv one, or the Diagonal(d) one is not a bug — that is user error).

both the findmin examples are symptoms of known warts in the reducedim machinery, and have a fix in WIP: The great pairwise reduction refactor · Pull Request #58418 · JuliaLang/julia unfortunate, yes, but the solution is quite hard. hopefully there will be more progress on this in proximal releases.

I think

What it reports if you ask the same for, say, numpy?

is really the key question here. although of course maybe numpy won’t yield a list as fun for two reasons:

  • it is a much more battle hardened library with an order of magnitude more users than Julia over a much longer period of time
  • it covers a smaller API surface than “Julia + all of its stdlibs”. so the equivalent comparison would be to ask for bugs across any python stdlib.

so results like this, I think, are more evidence that “bugs exist in Julia” (of course they do) and not evidence of “bugs in Base are a practical concern,” the question posed by the OP. I have myself found bugs in Julia, and I have devoted many hours of volunteer time fixing bugs in Julia. but I have also myself found bugs in the Python ecosystem, including “correctness” bugs in Polars, SciPy, cvxpy, networkx, some of which cost me full days of debugging. we can see, just browsing the issue trackers, xarray has its own quantile woes, or a simple call to scipy.stats.linregress([0, 1], [0, 1]) was returning incorrect fit results.

I do not mean to pick on those packages, I appreciate them and use them frequently. it is just to re-re-re-emphasize the point that has been made several times upthread that really nothing is completely safe from correctness bugs and all users of all software should carefully validate their results.

For OP’s question “should I use Julia” I say if you think it will be a good fit, please do! The language gets better for each person using it, exploring different corners, writing new packages, and yes finding issues to report about bugs or performance bottlenecks.

P.S. speaking only for myself I think critical views should be welcomed. the fact that @yurivish sticks around for so long is testament that they care about the development of the language & community at least a little bit :wink: . I see no reason to believe this list was provided in bad faith (even though it does come with more implied pessimism than I personally share), and it was actually quite useful! I had no idea about the foldr bug and now I’m going to go have my AI fix it.

Since OP was asking about Rust, I think it would be more relevant to consider its linear algebra libraries. The most popular seem to be nalgebra and faer, which are roughly comparable to LinearAlgebra. It will surprise absolutely nobody to see that both have plenty of bugs: Issues · dimforge/nalgebra · GitHub and Issues · sarah-quinones/faer-rs · GitHub

Hi @gunner, to steer things back to your core questions about stability, Operations Research, and the Rust vs. Julia comparison:

While I’ve only been using Julia for about half a year, I have had a really nice experience so far with Julia. I have not encountered any correctness issues, but they certainly do exist (as they do in Python, Rust, C++, or literally any language), but I have found the Julia language to be quite nice. Note, while Rust is a nice language, it does not magically solve most issues, only a certain class of issues concerning memory (Julia does have a garbage collector, which is also a solution to that problem) and data races.

What I want to say is that the most important thing is not actually how good the language is concerning its libraries and base, but instead to use modern best practices and have a good CI/CD pipeline catching errors and validating against external data. Now I will not pretend I know anything about operations research, but if it is possible to validate against data that you know is correct from external sources, that is absolute gold, since that will not only catch problems in established libraries and the language itself (rare), but problems you yourself created (extremely common). Note, none of this is Julia or Rust specific. Whatever you choose, a good CI/CD pipeline is probably the best you can do for correctness bugs.

Now Julia seems highly stable for standard use, as long as you are not actively playing around with compiler internals much, but I do not think you will need to worry about that in operations research. Automatic differentiation will probably take a while between Julia releases to reach the newest version.

The “instability” or “correctness issues” that often get highlighted, like the recent tangent in this thread using LLM-generated reports, seem to me like often synthetic edge cases or outright hallucinations rather than structural flaws you encounter in production. Relying on LLMs to parse bug trackers also conflates user error or tooling artifacts with actual mathematical correctness issues, making it a poor benchmark for language evaluation. (I want to say there is a certain irony in using an LLM to find correctness issues).

Julia also has the big advantage of having the ability to work interactively which is harder in Rust a language and you might spend a lot of time fighting the borrow checker and waiting for compilation. On the other hand, type stability in Julia is crucial to stability, and man are there sometimes weird reasons why the compiler can not correctly specialize in certain cases (though I got all of them to work after a time, just want to highlight that Julia has things one has to worry about that just don’t exist in Python or Rust).

I just think the most important part concerning correctness and stability does not come from the language but from the decision outside the language choice. Good luck with your projects, whichever language you choose!

Just wanted to say that I spoke up since my article was mentioned, and I think some of the concerns still apply. I keep re-auditing stuff every few years in a hope to come back to using Julia someday, but don’t think that I feel comfortable yet. With Julia I always seem to run into fairly severe issues quickly and differentially often relative to other technologies, which is a perspective I don’t often see represented here.

I have a feeling of grudging respect when it comes to Rust, and the ways the type system fosters composable and reliable software have impressed me. But I don’t find myself enjoying Rust anywhere near the degree that I enjoy using Julia (when it works), or Mathematica, or Clojure, partly because it lacks the tactile feeling of live feedback due to the compile-and-run cycle. But when concerns like reliability, code size, or performance take precedence it’s one of the tools I use.

Also, can confirm that this is accurate (and thank you @adienes):

It’s good to be skeptical of LLM-generated reports that haven’t been vetted by a human with sufficient knowledge, and I would consider it poor practice to spam maintainers with unvetted LLM outputs. On the other hand, it’s also not rigorous to dismiss LLM reports like this without vetting, and a one-off in an informal forum doesn’t amount to spam. Just to do one little bit of vetting, which should probably be branched into another topic if it catches on:

julia> using LinearAlgebra: givens, I

julia> R = givens(1.,1.,1,2)[1] * givens(1.,1.,2,3)[1] # lazy rotation
LinearAlgebra.Rotation{Float64}(LinearAlgebra.Givens{Float64}[LinearAlgebra.Givens{Float64}(2, 3, 0.7071067811865475, 0.7071067811865475), LinearAlgebra.Givens{Float64}(1, 2, 0.7071067811865475, 0.7071067811865475)])

julia> R' # lazy adjoint rotation
LinearAlgebra.AdjointRotation{Float64, LinearAlgebra.Rotation{Float64}}(LinearAlgebra.Rotation{Float64}(LinearAlgebra.Givens{Float64}[LinearAlgebra.Givens{Float64}(2, 3, 0.7071067811865475, 0.7071067811865475), LinearAlgebra.Givens{Float64}(1, 2, 0.7071067811865475, 0.7071067811865475)]))

julia> R*Matrix(I,3,3) # visualize the 3x3 rotation
3×3 Matrix{Float64}:
  0.707107   0.5       0.5
 -0.707107   0.5       0.5
  0.0       -0.707107  0.707107

julia> R'*Matrix(I,3,3) # ...and its supposed adjoint rotation
3×3 Matrix{Float64}:
 0.707107  -0.5        0.5
 0.707107   0.5       -0.5
 0.0        0.707107   0.707107

julia> (R*Matrix(I,3,3))' # this looks more right, adjoint=inverse here
3×3 adjoint(::Matrix{Float64}) with eltype Float64:
 0.707107  -0.707107   0.0
 0.5        0.5       -0.707107
 0.5        0.5        0.707107

julia> R*(R'*[1.,0,0]) # expect [1.,0,0] by * associativity, adjoint=inverse
3-element Vector{Float64}:
  0.8535533905932735
 -0.1464466094067262
 -0.4999999999999999

julia> R*((R*Matrix(I,3,3))'*[1.,0,0])
3-element Vector{Float64}:
  0.9999999999999998
 -5.551115123125783e-17
  0.0

I’d consider that one substantiated. As for a Rust comparison, I stared at nalgebra for a bit, but I couldn’t figure out how to compose 2 Givens rotations. Not surprising, I maybe spent 2 hours total writing Rust in my life.

Fair point, and also thanks for vetting the example. You are right it is a bug in LinearAlgebra. To clarify this a bit, I don’t want to say LLMs are incapable of surfacing real issues. I have more of the concern of the venue and context of this whole post.

Someone asked a high-level structural question about language suitability and received multiple constructive responses. And then someone drops a block of unvetted LLM output that heavily derails the conversation. Just the dropping of an LLM output forces the community to do exactly what you just did, vetting the output line by line, right in the middle of a thread that is supposed to be about the suitability of Rust vs Julia.

Though I think you might have highlighted my point a bit, which is primarily that in Julia, you found a real bug, and in Rust, you hit a massive usability wall trying to figure out how to use nalgebra. Whatever someone chooses, they are going to hit ecosystem friction, gaps in documentation (we all collectively should write sooooo much more documentation (I am no better in this regard than anyone else, unfortunately)), and plain bugs. And that is why I think rigorous CI pipelines and external safety validation are so immensely important, regardless of language choice. In an ideal world, we would have a gigantic CI infrastructure testing all of the most important Julia packages in a bunch of weird configurations, but unfortunately, none of us is rich enough for that. :frowning:

I think we are kind of on the same page on the need for rigor, but maybe this whole thing would be better served in an issue tracker on GitHub or a separate discussion thread so this thread can return to the OP’s questions.

Going off topic, Wolfram Research seems to invest very little into supporting large-scale software development in Mathematica, in terms of both language features and surrounding tools, making Mathematica almost a write-only language. The interactive programming experience with the native notebook app, however, is top-notch, and puts Jupyter notebook to shame. Julia is at least a middle ground that incorporates modern software engineering practices while retaining an interactive workflow and a math’y syntax. I migrate my Mathematica code to Julia whenever I can, but it’s not always possible.

I woke up this morning and saw @adienes hadn’t gotten to this quite interesting (though concerning!) one:

I think I found a simple fix Fix interaction between `foldr` and `Iterators.flatten` · Pull Request #61806 · JuliaLang/julia but we’ll see if review / the full test suite surfaces any problems with it.

____________________

Edit: I also did fix forwarding of `beta` kwarg in `quantile!(::AbstractArray...)` · Pull Request #205 · JuliaStats/Statistics.jl to address

What prompt did you use here? Given this found some good bugs it might be helpful for there to be some more semi-regular use of AI to search for language bugs. That seems something that could be automated, similar to how @ChrisRackauckas automates a lot of SciML boilerplate maintenance via AI-generated PRs these days.

What annoys me a bit about @yurivish blogpost is not that its wrong or anything, but the framing that it’s somehow unique to Julia, and that people keep bringing it up as a “well, its proven that Julia is a mess”. Of course if they were too annoying for @yurivish to do their everyday work, its a very fair point, and if they found something that works better, great!

But there are millions of happy Julia users and in general it’s pretty normal to find bugs in software if you look for it.
Secondly, if you compare a 20 year old industry standard with millions in funding to a 3 year old library written by some Phd, the latter one is bound to fail on many metrics.
To be fair, that might be the point of the blogpost, but it ignores the dimension of how things will be in the future and that it’s not necessarily an architectural problem and also not across the whole ecosystem.

I think there’s also a mindset split, some people just like to have things more strict and avoid bugs by having their compiler proof everything, and others like more freedom and are fine with occasional mishaps. I’m pretty deep in the latter group while wanting performance, therefore Julia is pretty perfect for me (btw, I haven’t had a single Julia bug in the recent 6 years or more, and most of the dependencies I use are really stable).
If you’re in the former group, I don’t think Julia is as good (compared to e.g. Rust), although I still think its much better in that regard than Python architecturally.
Regarding Rust, I’ve just recently read a pretty interesting blogpost (sadly cant find it right now) on how the Rust compiler only helps you avoid ~5-20% of classic correctness and security bugs, and the rest still needs having lots of testing and hardening - and part of the current image of security for Rust has mainly been because of security by obscurity, and now that its used more, people start finding pretty big bugs.

Just for the fun of it, I put claude on Python, and it also found some eye watering correctness issues (to be fair, I haven’t taken the time to verify and judge them, but it seems like that’s a similar situation for the Julia version):

Library Reproducer Got Expected Why it matters
stdlib random.choices(['a','b','c'], weights=[-1,5,1], k=10000) Counter({'b': ~8000, 'c': ~2000}) error or proportional sampling Negative weight on 'a' silently shifts
mass to the next bucket. Validation only catches the case where the total ≤ 0.
stdlib random.choices(['a','b','c'], cum_weights=[5,2,7], k=10000) Counter({'a': ~2800, 'c': ~7200}) error on non-monotone Non-monotone cum_weights makes 'b'
unselectable. No validation.
stdlib statistics.fmean([1,2,3], weights=[-1,1,1]) 4.0 error or value in [1,3] “Mean” of three values in [1,3] returns 4 — outside the convex hull.
stdlib json.dumps({1: 'a', '1': 'b'}) '{"1": "a", "1": "b"}' error or merge Produces invalid JSON with duplicate keys; round-trip silently drops one entry. Same for
{True:'a','true':'b'}.
stdlib json.dumps({1: 'a', '1': 'b'}, sort_keys=True) TypeError: '<' not supported between str and int succeed Internal sort runs before the documented int→str key
coercion.
stdlib urlparse('http://example.com/?').geturl() 'http://example.com/' unchanged Trailing ? (empty query) and # (empty fragment) silently stripped — breaks
signing/canonicalization round-trips.
stdlib a=dt(2026,11,1,1,30,tz=NY,fold=0); b=a.replace(fold=1); len({a,b}) 1 2 Two different absolute instants compare equal and dedupe in set/dict even though
a.timestamp() != b.timestamp().
stdlib dt(2026,3,8,1,30,tz=NY) + timedelta(hours=1) 2026-03-08 02:30 EST (a wall time the timezone says doesn’t exist) 03:30 EDT timedelta arithmetic on aware
datetimes is wall-clock, not absolute.
numpy x = np.array([1,2,3], dtype=np.int8); np.where(x>1, x, 1000) array([-24, 2, 3], dtype=int8) error or upcast 1000 silently truncates against the array dtype.
x[0]=1000, x.fill(1000), np.full(3,1000,np.int8) all raise — np.where is the silent outlier.
numpy np.array([np.nan, np.inf, 1e20]).astype(np.int64) array([INT64_MIN, INT64_MIN, INT64_MIN]) (RuntimeWarning only) ValueError np.array([nan], dtype=np.int64)
raises; astype produces garbage. Same call, two contracts.
numpy np.histogram([1, 2, 3, np.nan, 5]) ValueError: autodetected range of [nan, nan] is not finite histogram of the four finite values One NaN among four finite values
poisons auto-range.
numpy np.unique([1, np.nan, np.nan, 2]) vs np.unique([[1,np.nan],[1,np.nan]], axis=0) first collapses NaNs → [1,2,nan]; second does NOT collapse same semantics flat vs
axis Same operation, two answers.
numpy np.intersect1d([1, np.nan], [2, np.nan]) vs np.union1d([1, np.nan], [2, np.nan]) [] vs [1, 2, nan] consistent NaN semantics Union collapses NaNs, intersection
treats them as unequal.
numpy np.isin([1, np.nan, 2], [np.nan]) [False, False, False] matches unique’s view A third NaN semantic in the same family of set ops.
numpy x = np.random.randn(63); np.fft.irfft(np.fft.rfft(x)) length 62 array, max err ~3.1 against x round-trip identity rfft discards parity; irfft defaults
to even length. No warning when input was odd; silent data corruption.
numpy np.average([1, 2, 3], weights=[-1, 1, 1]) 4.0 error or value in [1, 3] Same negative-weight pattern as Python’s random.choices / statistics.fmean.
numpy np.argmin([1.0, np.nan, 0.5, 2.0]); np.argmax([1.0, np.nan, 0.5, 2.0]) 1 and 1 2 and 3 The NaN slot wins both argmin AND argmax.
numpy np.median([1, np.nan, 2]) nan 1.5 (or hard error) Silent NaN propagation; need np.nanmedian. statistics.median raises instead.
numpy np.maximum(1, np.nan) vs np.fmax(1, np.nan) nan vs 1.0 one consistent semantic Two functions, same name shape, opposite NaN behavior.
numpy np.searchsorted([1, 2, 3, 4, np.nan], np.nan) 4 (i.e. before the existing NaN) well-defined Insertion point for NaN lands inside the NaN run.
numpy np.array([1, 'two', 3.0]).dtype dtype('<U32') error or object Silent stringification; arr[0] == 1 is now False.
numpy np.nansum([np.nan, np.nan]) 0.0 nan or empty-input error “Sum of these values” is 0 when every value is NaN.
numpy np.outer(np.eye(2), np.eye(2)).shape (4, 4) (2, 2, 2, 2) Kronecker-like np.outer silently flattens 2-D inputs to 1-D.
numpy A=np.zeros((2,3,4)); B=np.zeros((2,4,5)); np.dot(A,B).shape (2, 3, 2, 5) (2, 3, 5) (batched matmul) np.dot does tensor contraction, not batched matmul. A @ B
does the latter; they diverge at 3-D.

While it’s definitely true that Rust’s compiler only catches a minority of bug types, I do think there’s more to it than just security by obscurity.

I think it has more to do with the fact that Rust appeals primarily to people who are very interested and motivated by correctness concerns, and they have developed an overall culture of people that spend a lot of time thinking about nasty corner-cases of API design, and ways to systematically avoid them.

So the compiler features basically served as a ‘seed’ that caused a correctness focused culture to crystalize around it. This is great for them, but it should be noted that it’s not without tradeoffs, just like how Rust’s technical features aren’t without tradeoffs.

Rust developers being very concerned with corner-cases, know that a lot of these things are typically caused by weird interactions between features, so when in doubt their preferred solution is to just limit the number of ways these different features can interact.

These design choices do have real impacts that are felt when using these languages. It’s not just that Rust’s compiler is fussy. The language *design* is extremely fussy, and the third party libraries are (mostly) fussy because they have a fussy culture.

I think we do pretty good, especially in comparison to every julia dev’s favourite punching bag (Python), but it is just true that we expose a larger surface area for weird things to happen than a language like Rust, and most of our developers don’t spend nearly as much time worrying about weird ways our code can go wrong as Rust developers do.

I do agree :wink: I do feel much better about a http library written in rust than in Julia for various reasons like this! My point was more like, you can’t avoid all kind of bugs just with language design :wink:

I am mostly a Julia hobbyist, so not exactly the target audience for your question (though I am trying to use more Julia at work). But I would like to add one thing.

The main reason I don’t worry about the long-term future of the language is how many talented folks there are in the community who care enough to devote their time to Julia’s development. As we just saw in this thread, @yurivish posted a list of valid bugs and possible missing features, and within ~2 hours @stevengj had a proposed fix for one of them and within ~12 hours @mason had fixes for another two! That’s incredible to me!

If you do start programming in Julia and hit roadblocks, performance issues, or bugs, the people here are more than willing to help. Just search for posts like “My code in X language is slower than Julia”, and look at all the replies.

In the end though, the best test might just be to code the same (small) project in Julia and Rust and see which one works best for your needs.

Yeah, and conversely, I’m not sure I’d really trust the Rust community to do a better job making an ODE solver library or a Libm implementation than the julia folks, especailly because the sorts of ‘correctness’ issues one encounters with these numerical projects are just not the sorts of issues the core Rust community is very familiar with.

As a short response to the original question from @gunner. You have to consider if your use case fit’s the language’s vision

  • Julia’s main foucus is scientific computing which is more mature in Julia than Rust.
  • Julia’s SciML ecosystem is developed by experts in the field and tested over numerous applications; you will not find a package ecosystem more fit for this purpose anywhere else
  • You are doing research where time to prove your idea is critical, if you’re doing a graduate degree, you don’t want to be doing longer than you need to be. You need a language that is easy to prototype in. Rust is not it.

Rust is better at robust systems programming and application development and there are far more programmers in that field than in scientific computing. Because of this, many people criticize Julia for not solving their problems, but Julia was never purpose built for most people’s problems. Julia is primarily built to solve problems that arise in the numerical computing, simulation, and SciML space. Julia is niche, but niche is good if your unique problems are in that niche. Numerical computing is a completely different beast than app development, and it deserves a unique language to deal with the issues you see in that space.

If you want a good example of a Python package that does numerical heavy lifting under the hood, PySR is an excellent example that I learned a lot from. It uses SymbolicRegression.jl. In fact, the author of these packages, @MilesCranmer, is also implemented his symbolic regression tool in Rust. If there is anyone who can authoritatively answer all your questions on this subject, it’s him.

We already more or less have this. GitHub - JuliaCI/NanosoldierReports: A repository for human-readable reports generated by the Nanosoldier.jl CI system. · GitHub tests all the Julia packages and GitHub - Seelengrab/Supposition.jl: A Julia implementation of choice sequence based PBT, inspired by Hypothesis · GitHub is an almost magic tool for generating weird configurations (no LLM needed).

I used Supposition.jl heavily when making ZipArchives.jl and you can compare the bug reports Issues · JuliaIO/ZipArchives.jl · GitHub with the latest version of the rust library it was initially based on Issues · zip-rs/zip2 · GitHub

Since I saw autodiff mentioned on the Julia front (with Enzyme), I’m just going to chime in with a fun fact that Rust is also developping a std::autodiff (std::autodiff - Rust), which also calls into Enzyme!

That said the Julia ecosystem for autodiff is a lot more feature-complete than the current rust one (but they’re also working on it too!).

I’ll just chime in quickly to say we’ve been using Julia as the main research/data language at our hedge fund since 2021. In that time, we’ve only really run into one major issue, which is the fact that some of the built-in Julia functions like use SIMD and there’s no easy way to turn that off, leading to slightly different results on different hardware. But that’s very common in numerical ecosystems (e.g. Python does the same thing) and a fairly niche issue.

wow, is that a surprising choice to put it so first class in std namespace? it seems surprising to me, but maybe Rust users are more autodiff-happy than I assumed?