Blog post: Rust vs Julia in scientific computing

bertschi · July 21, 2023, 4:49pm

Yes, unfortunately. Do you know of any library trying to fix that? As far as I understand, handlers and restarts should not be difficult to implement via dynamic binding … Julia seems to have task_local_storage which might be similar?

NicholasWMRitchie · July 21, 2023, 5:29pm

Having attempted to write a moderate sized scientific library in Julia and also for Rust for a field I know very well (20+ years experience with developing software in the field), allow me to say that both Julia and Rust have their challenges. Rusts promise of “write once code” is very appealing. I’d like to write something once and move on. Robust multi-theading is great and I learned a lot about what not to do from the Rust compiler. I like compilers that tell me what I’m doing wrong. However, many of the libraries I need just aren’t available. The infrastructure for scientific computing in Rust is insufficient at this time.
On the other hand, getting Julia to perform optimally can be a real challenge. Memory allocation/deallocation is my nemesis. Compared, lets say to the Java memory management, Julia’s garbage collected memory management is plodding. More often than not my multi-threaded code gets bogged down in garbage collection. (Yes, I’ve spent a lot of time reducing memory allocation but some (and sometimes a lot) is inevitable.)
In the end, I gave up on Rust because of the challenges of implementing dynamic dispatch behavior. Rust enums aren’t an adequate replacement. Also there just don’t seem to be any decent cross-platform GUI toolkits.
But I keep on using Julia because it is a great way to try new ideas and algorithms in a dynamic environment with tabular data and plotting tools. Multiple dispatch is eye-opening. I love that each release hacks away at the TTFX problem - a real problem for my library which is requires many dependencies and implements many algorithms and requires much data.

Mo8it · July 21, 2023, 6:01pm

I totally agree. It is growing, but it is not comparable to the current ecosystem of Julia.

But remember that people did say and sometimes still say that Julia’s ecosystem is lacking behind that of Python. It is a “chicken and egg problem”.

I know what you mean. Iced looks very promising for the future and GTK bindings are very good, but Rust still has a long way to go regarding GUI.

If you want something functional but not “beautiful on each platform”, which is often the case in scientific computing, then I would highly recommend using egui.

I did write a GUI for an Ising simulation with egui:

I don’t think that the GUI situation is any better for Julia though.

martin.d.maas · July 21, 2023, 6:59pm

Just a quick clarification. I also wish this was possible. In fact, I was just atempting to write a custom allocator a few days ago in pure Julia using Arrays… given that we can pre-allocate+@view very efficiently, I thought that maybe we could also reshape and reinterptet a memory pool. But it is sadly not the case, I believe reshape allocates, and you can’t write into a reintepreted array.

Also, in my case, I think I never write type unstable code (maybe because I try not to define many types, if at all) but I do find the workflow of prototyping with allocating code and refactoring later a little bit annoying. So I was trying to come up with a useful tool to make that easier.

Another thing we can hope for is better compiler support for automatic stack allocations of MVectors, which currently are quite restricive in how they can be used and end up in the heap anyway (they have to live within inlined functions or some other problem related to @GC.preserve). If I remember correctly, this is solvable with a proper compiler optimization pass, but adding a feature like that to the compiler is very hard.

datnamer · July 21, 2023, 7:26pm

This is extremely interesting. Can you provide any more information like rough timelines, objective, features etc ?

mkitti · July 21, 2023, 7:41pm

Julia offers functions like zeros, ones and fill. But these have the overhead of overwriting the memory first.

This notion is too simplistic versus how modern operating systems function. I would like to call your attention to calloc.

Here is the description from man 3 calloc on a GNU Linux system.

The calloc() function allocates memory for an array of nmemb elements of size bytes each and returns a pointer to the allocated memory. The memory is set to zero.

In Julia, you can access this via Libc.calloc. However, I have made it even easier to use via ArrayAllocators.jl:

julia> using ArrayAllocators

julia> @time A = Vector{UInt8}(undef, 1024^3);
  0.008795 seconds (2 allocations: 1.000 GiB, 99.15% gc time)

julia> @time fill!(A, 0)
  0.215130 seconds (136 allocations: 5.688 KiB, 2.52% compilation time)

julia> @time sum(A)
  0.186580 seconds
0x0000000000000000

julia> @time sum(A)
  0.192126 seconds
0x0000000000000000

julia> @time B = Vector{UInt8}(calloc, 1024^3);
  0.007696 seconds (5 allocations: 1.000 GiB, 99.10% gc time)

julia> @time sum(B)
  0.244811 seconds
0x0000000000000000

julia> @time sum(B)
  0.140942 seconds
0x0000000000000000

julia> @time C = zeros(UInt8, 1024^3);
  0.328151 seconds (2 allocations: 1.000 GiB, 2.14% gc time)

julia> @time sum(C)
  0.218502 seconds
0x0000000000000000

julia> @time sum(C)
  0.143196 seconds
0x0000000000000000

Observations:

The array creation time for A and B are about the same. The creation time of C takes significantly longer.
fill! takes a considerable amount of time. You are correct in that zeros is just undef based memory allocation followed by fill!(A, 0).
The initial sum(A) is faster than sum(B) but the difference is smaller than the time needed for fill!(A,0)
Subsequent calls to sum(A), sum(B), and sum(C) are roughly equivalent in time.

The situation here is more complicated than your explanation.There is a hint in the GNU libc documentation of why this might be.

You could define calloc as follows:

void *
calloc (size_t count, size_t eltsize)
{
  void *value = reallocarray (0, count, eltsize);
  if (value != 0)
    memset (value, 0, count * eltsize);
  return value;
}

But in general, it is not guaranteed that calloc calls reallocarray and memset internally. For example, if the calloc implementation knows for other reasons that the new memory block is zero, it need not zero out the block again with memset. Also, if an application provides its own reallocarray outside the C library, calloc might not use that redefinition. See Replacing malloc.

rveltz · July 21, 2023, 8:54pm

I am confused: you mean that you decided to settle on Rust?

Benny · July 21, 2023, 10:10pm

Multithreaded GC is coming in v1.10, I wonder if that could improve scaling with multithreaded allocations.

This is a user problem, but a developer opportunity. If SciPy could do it I bet some Rust crates can.

Elrod · July 21, 2023, 10:26pm

FWIW, my benchmarks were on Julia master.

Elrod · July 21, 2023, 10:53pm

StrideArraysCore.jl exists to make some of that easier.
You can reinterpret, setindex!, getindex, reshape, etc.
It also provides an @gc_preserve macro, which needs work. This macro should GC.@preserve all arguments to a function call, and try and take PtrArray views. This can help when using MArrays, as it technically prevents their mistake.

I almost never intentionally write type unstable code, but I write only a tiny fraction of the Julia code that I run, and Juila’s compiler likes to give up (without telling you), which is why I can end up with things like https://github.com/PumasAI/SimpleChains.jl/blob/f028d69679d47f11d35e7f311abdf0d1d3bfab9c/src/SimpleChains.jl#L94-L111
With Julia, I’m faced with an endless fight against the language and ecosystem, or giving in and embracing mediocrity.
I mostly prefer keeping my opinions to myself (which is why I deleted an earlier comment), as they’re not constructive. There is a long list of better things to do to try and move things forward in a positive direction, than to spread negativity or rant online.

Benny · July 21, 2023, 11:07pm

Elrod:

julia> @time foo(GarbageCollector(), X, f, g, h, 30_000_000)
 19.134882 seconds (30.00 M allocations: 71.526 GiB, 12.61% gc time)
1.4034906850760502e10
...
julia> @show Threads.nthreads();
Threads.nthreads() = 36

julia> @time foo_threaded(GarbageCollector(), X, f, g, h, 30_000_000)
235.488171 seconds (1.08 G allocations: 2.515 TiB, 53.87% gc time)
4.957047359932209e11

Looks like it doesn’t scale well despite working more. I’m surprised at how much more, 2.41s * 52.7 → 127s for only ~36x the garbage, but there’s probably math explaining that from the garbage being in the same heap. Am I assuming correctly that you have a 36 core machine to share those threads?

Think escape analysis doing eager frees could replicate the performance of the manual frees in your benchmark? Not sure if that’s worth doing everywhere but it seems like it’s very worth it in multithreading. Also reminded me to read up on RAII again, I still don’t know what happens there when heap allocated data is returned from scope.

xiaoxi · July 22, 2023, 8:28am

By the way, there is a talk comparing Julia and Rust at the upcoming JuliaCon.

greatpet · July 22, 2023, 9:12am

How good is Rust’s macro system compared with Julia’s? It’s subtle to use macros in a nested manner in Julia, according to a Github issue:

github.com/JuliaLang/julia

Macro hygiene is hard to use correctly in nested macro expansion

opened 05:41AM - 22 Sep 20 UTC

c42f

macros

With nested macro expansion, the inner macro sees an expression generated by the… outer macro which may include `Expr(:escape)`. However, macro writers generally test macros only with a single level of expansion, not including `Expr(:escape)`. This means that macros which pattern match their input are *incorrect by default* when used in a nested expansion. As a simple example of how pervasive this problem is, consider that `Base.@view` cannot generally be used within the AST generated by another macro: ```julia julia> macro m(ex) quote @view $(esc(ex)) end end julia> A = [1,2,3] julia> @m A[1:2] ERROR: LoadError: ArgumentError: Invalid use of @view macro: argument must be a reference expression A[...]. Stacktrace: [1] @view(::LineNumberNode, ::Module, ::Any) at ./views.jl:123 in expression starting at REPL[9]:3 ``` The problem here is that `@view` gets provided with `esc(:(A[1:2]))` as an argument, which is not an `Expr(:ref)` as naturally expected by the authors of `@view` This problem occurs whenever macros try to pattern match their input rather than simply substituting it into a larger expression. The pattern matching must be aware that Expr(:escape) *could occur anywhere*. Anybody writing macros directly against the Expr API (by using the head field, etc) is going to handle this incorrectly. This usability issue has also been discussed at length in https://github.com/JuliaLang/julia/issues/23221. However that issue doesn't describe the problem very clearly as a problem of usability, so I thought I'd restate it here. Here's another interesting case: ```julia julia> macro m2(ex) quote @show $(esc(ex)) end end julia> @m2 1 $(Expr(:escape, 1)) = 1 ``` ## What to do? A possible way forward is to treat this as an `Expr` API problem: if pattern matching within macros is incorrect by default, maybe we need better ways to pattern match expressions — for example as in `MacroTools` or `MLStyle` — ensuring that any appearance of `Expr(:escape)` doesn't break the matching process, and returning matched pieces with a correctly nested level of escape. A larger overhaul of the macro system as in https://github.com/JuliaLang/julia/pull/6910 has also been mentioned in relation to this. In that PR, `quote`ed code created within macros is transformed during lowering, such that every quoted symbol made by the macro is unescaped with `Expr(:hygenic, sym)`. I'm not sure whether it solves the problem completely or simply shifts it around to create new and exciting footguns for macro writers.

Since I don’t know Rust, I’m wondering if Rust handle this kind of situations better.

Moving on from this weird case, more generally, is it as easy to define DSLs in Rust as it’s done in Julia’s JuMP package?

ChrisRackauckas · July 22, 2023, 11:32am

No, not yet.

NicholasWMRitchie · July 22, 2023, 11:58am

Actually, today I use Julia for algorithm development and reproducible data analysis (thanks, DrWatson) and then, once developed, I rewrite them to Java. Yes, this is a non-standard choice but Java has come a long way over the past 20+ years and it is a language I’ve become very comfortable with (including GUI development).

NicholasWMRitchie · July 22, 2023, 12:00pm

Can’t wait to find out how the Multitheaded GC will perform. Time to download a pre-release version.

Oscar_Smith · July 22, 2023, 12:37pm

juliaup now supports alpha versions

Elrod · July 23, 2023, 12:14am

[quote=“Benny, post:139, topic:101711”]
Looks like it doesn’t scale well despite working more. I’m surprised at how much more, 2.41s * 52.7 → 127s for only ~36x the garbage,[/quote]

I’m not sure. I thought it might be because of the generational assumption being violated in multithreaded contexts, but enabling
GC.enable_logging(true) only ever reports incremental collections (which is good).

I’m on a different computer now than earlier (10980XE instead of 7980XE), but they’re basically the exact same CPU.

Note that these are both actually 18-core CPUs.
So we have twice as much work per physical core in the multithreaded case. The mallocs doing any better than >2x the single threaded time means they’re getting really good multithreaded scaling.

Baseline is similar on the 10980XE, except (surprisingly) it is a bit slower:

julia> @time foo(GarbageCollector(), X, f, g, h, 30_000_000)
 21.603470 seconds (30.00 M allocations: 71.526 GiB, 11.78% gc time)
1.3620400542987349e10

julia> @time foo(LibcMalloc(), X, f, g, h, 30_000_000)
  3.164538 seconds (1 allocation: 16 bytes)
1.3620400542987349e10

julia> @time foo(MiMalloc(), X, f, g, h, 30_000_000)
  2.128713 seconds (1 allocation: 16 bytes)
1.3620400542987349e10

julia> @time foo(JeMalloc(), X, f, g, h, 30_000_000)
  1.976689 seconds (1 allocation: 16 bytes)
1.3620400542987349e10

julia> @show Threads.nthreads();
Threads.nthreads() = 36

julia> @time foo_threaded(GarbageCollector(), X, f, g, h, 30_000_000)
222.812451 seconds (1.08 G allocations: 2.515 TiB, 59.32% gc time)
4.903344195475447e11

julia> @time foo_threaded(LibcMalloc(), X, f, g, h, 30_000_000)
  8.182727 seconds (222 allocations: 20.703 KiB)
4.903344195475447e11

julia> @time foo_threaded(MiMalloc(), X, f, g, h, 30_000_000)
  4.208087 seconds (222 allocations: 20.703 KiB)
4.903344195475447e11

julia> @time foo_threaded(JeMalloc(), X, f, g, h, 30_000_000)
  4.512129 seconds (223 allocations: 20.734 KiB)
4.903344195475447e11

julia> versioninfo()
Julia Version 1.11.0-DEV.142
Commit d1be33d4bc (2023-07-22 20:20 UTC)
Platform Info:
  OS: Linux (x86_64-generic-linux)
  CPU: 36 × Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, cascadelake)
  Threads: 53 on 36 virtual cores

Now, enabling GC logging…

julia> GC.enable_logging(true);

julia> @time foo(GarbageCollector(), X, f, g, h, 30_000_000)
# huge wall of GC: pauses that look just like the below:
GC: pause 1.55ms. collected 45.875200MB. incr 
GC: pause 1.44ms. collected 45.875200MB. incr 
GC: pause 1.53ms. collected 45.875200MB. incr 
GC: pause 1.53ms. collected 45.875200MB. incr 
 22.042343 seconds (30.00 M allocations: 71.526 GiB, 12.45% gc time)
1.3620400542987349e10

julia> @time foo_threaded(GarbageCollector(), X, f, g, h, 30_000_000)
# the end contained single threaded GCs
# when we were down to 1 task, but the 
# bulk contained collections like:
GC: pause 67.41ms. collected 1397.212160MB. incr 
GC: pause 70.31ms. collected 1454.510080MB. incr 
GC: pause 73.16ms. collected 1324.771840MB. incr 
GC: pause 69.26ms. collected 1434.995200MB. incr 
GC: pause 70.86ms. collected 1469.299200MB. incr 
226.461490 seconds (1.08 G allocations: 2.515 TiB, 59.97% gc time)
4.903344195475447e11

They were all incr, none of the collections during these runs were full.
For the multithreaded case, my computer was at only 40% average utilization (according to btop).

Yes, I think that would let us replicate the performance of manual frees. We may even be able to do better in some specialized circumstances like this benchmark, by having less checks on a reuse fast-path (one implementation of that could get! from task local storage under the hood, and use a weakref to allow the memory to be reclaimed).

Depends. Worse case scenario, it gets copied. In those cases, you can/should generally move the memory out. That means the destination will take ownership.

More commonly, (Named) Return Value Optimization [i.e. (N)RVO] should apply. When this optimization applies, instead of the callee both allocating and filling, the caller actually does the allocation and passes in a reference to the callee, which then fills it.

roland-KA · July 23, 2023, 8:46pm

Very well explained!

And the the central point is: Rust is a systems programming language whereas Julia is application- and domain-oriented. These are fundamentally different objectives.

The whole comparison of @Mo8it is apples to oranges

xiaoxi · July 24, 2023, 7:44pm

@Mo8it Your post is getting more and more popular. It has been linked twice in Hacker News.
https://news.ycombinator.com/from?site=mo8it.com

Topic		Replies	Views
Rust vs Julia Offtopic rust	11	33542	May 19, 2022
Comparison of Rust to Julia for scientific computing? Offtopic question , rust	123	33768	July 17, 2023
For Julia programmers who also work with Rust: When you write in Rust, do you miss Julia's multiple dispatch, or are there features of Rust that make up for that loss? Offtopic multidispatch , rust	4	2048	May 28, 2021
Julia vs Rust ML which is the best path? Community question , machine-learning , rust	9	3472	June 27, 2021
Very weird take on Julia found on the LinkedIn. Have u come across this misconception before? Offtopic rust	2	889	January 3, 2023

Blog post: Rust vs Julia in scientific computing

Related topics