Blog post: Rust vs Julia in scientific computing

I am not aware of any HFT firms using Julia (or Rust, for that matter). I am just saying that Julia’s allows a high level of “consistent performance” by using the pattern mentioned earlier in the thread: allocate chunks of memory → do lots of long-running computation with zero allocation → free the memory later.

However, only a fraction of HFT-related code requires consistent performance - which makes Julia even a better fit when comes to transitioning from idea/strategy/backtest to real-time execution (even if you might still end up wrapping some C++, given its strong legacy in HFT).

2 Likes

Not really. Automatic non-GC memory management (RAII) in C++ is easy and the norm, but not possible in Julia.
Much more than that, you can allocate any user defined classes with most any allocate you’d like in C++ (although containers can get messy when using custom/non std::allocators). You cannot do this with Julia; you can’t define an allocator and create mutable struct instances with it. You can’t write “real code” in Julia in this way, but are basically limited to arrays containing isbits types.

5 Likes

I think as the trading model grows larger and more complex, it is not so straightforward to just allocate everything in advance.

Anyway, I agree that “maximum attainable performance” is not the primary obstacle for the use of Julia in HFT. We’re likely all familiar with how Julia can be wrangled to match or best most languages on most benchmarks—my bigger concern would be with the predictability of performance, w.r.t accidental regressions, missing a branch to precompile, etc.

2 Likes

Enjoying the discussion put forth by @Mo8it, as one can always learn something from this kind of comparisons.

The “happy path” in Julia seems to be to just pre-allocate outside of hot loops. Reusing the same memory addresses over and over will hopefully make the memory stay in the cache (which in modern CPUs can be in the range of 32-96MB for L3).

In my experience, the kind of “low-level Julia” code that you write (which is admirable) is simply too hard to write, as fighting the GC is not easy at all.

But the “happy-path” I just mentioned is something that I found I also had to do in C++ when using so-called zero-cost abstractions there like std::vector. std::vector is way slower than C-style stack-allocations unless you pre-allocate, as I found out in my test here.

After running this test, I stopped worring too much about C and stack allocations and came to the conclusion that the happy path of pre-allocation in Julia would give me nearly 100% performance when compared to C++. I haven’t tested the same kind of thing with multithreaded code, but I suspect the conclusion would be similar.

That being said, I believe the OP point on performance is related to the fact that if anyone is having any meaninfull time spent on GC, they are just not pre-allocating (not in the happy path).

Rust’s compiler, on the other hand, is known to beat you into submission in order to make you stay in Rust’s happy path. I belive that ownership semantics allow them to allocate on the stack safely, so I would say there is a point there to consider. Maybe Rust’s default stack-allocation is a nice feature indeed (so, in comparison to Julia we would avoid problems with too many allocations), but the cost is that we should have to submit to the borrow checker, so I’m not sure if it’s a good trade-off.

I think I would personally prototype code that allocates a lot a makes my coding fast, then optimize it when stable, than to take a large ammount of time trying to please a very strict compiler.

Another point to throw into the discussion that so far hasn’t been mentioned is compilation times. From the funny video Interview with senior Rust developer I get that compilation times in Rust are very slow. It seems it could be even worse than C++ (especially now that C++ introduced modules to reduce compilation time). Considering that the most annoying problem with Julia was actually not having a GC but “time-to-first-plot” (well on in it’s track to being resolved) this is actually what really puts me off with Rust. Slow compilation times, plus a hard-to-use compiler… too much to cope with for me, for little to no percievable gain.

Other points of relevance that haven’t been mentioned: how mature is GPU computing in Rust? For what I see here not so much. And arguably the GPU has and will have a growing importance in scientific computing, so Rust seems to be really lagging in a critical area. It makes me wonder if the language is even well designed to be run on a GPU. For a borrow-checked language with primary support of GPU computing, we will have Mojo, btw.

A final point. The worst bugs I had to deal with so far doing scientific computing were definitely not “code” bugs, but rather nasty “off by one” or similar bugs, that a compiler simply won’t catch as it cannot understand what I’m actually trying to compute. The only systematic way that I found of dealing with such bugs was to introduce abstractions that forced me into certain specific restrictions. In Julia, which is generic by default, we can introduce any restrictions to our code by making functions take some abstract type instead of being fully generic, or by enforcing that some method is implemented on a given type with a simple @assert. I’d rather go with this phillosophy: generic by default, and where I am able to introduce restrictions whenever I see fit for my own “off-by-one” safety and similar issues.

17 Likes

Congrats @Mo8it! The Rust newsletter promotes your blog post.
https://this-week-in-rust.org/blog/2023/07/19/this-week-in-rust-504/

5 Likes

If you require all Julia code to pass JET’s type checking […] you’re essentially using Julia as a statically typed language

Not quite. What I envision is a language where you do the prototyping and experimentation in a “dynamic dialect” of Julia where you don’t care about type stability, then when you’re done experimenting, you run a JET-like tool on the codebase and tweak it until it’s essentially a static language. Bonus points if you then have the option of compiling your program to a static binary after that.
Kind of like the two-language problem, except that you don’t need a rewrite but only minor tweaks, and you don’t get the awkward language barrier or the installation/compilation problems of two languages.

I suspect that such an “almost statically typed” dialect of Julia will be adopted by only a minority of the community

Yeah that’s probable. I hope you’re wrong though. I hope there will be a cultural consensus that widely used libraries should be completely inferrible.

24 Likes

Can you show some example(s) of those type instabilities in cases where they are actually relevant (not by design)?

I think type instabilities that are by design are still problematic. For one, they prevent static analysis of your program. Both by blocking inference and by spamming the JET output so you miss the real issues Eventually, they might also prevent compilation of your program to static binaries (but let’s see).

Some examples of this are:

  • Spurious instabilities in Base. These tends to be quickly fixed. Examples include #49801 and #48420
  • Instabilities intentionally added to curb compile times. These are typically in print- and show-related methods
  • Instabilities in packages, due to design issues. This is rampant in Pkg, in Artifacts and in some packages like FilePathsBase.
6 Likes

One point is that, while it’s possible, you don’t really need your prototyping code to be type-unstated and slow. When you are used to thinking about type stability and performance, it’s not that hard to mostly write in the ‘static dialect’ when exploring and prototyping.

I wouldn’t over-emphasize the separation between the dialects. Personally, I always think of performance from the very first line of code, in all languages I use, and I don’t think that it hampers exploration much.

4 Likes

If I understand it correctly, it’s part of Go’s GC being concurrent instead of stop-the-world; it interleaves its cycle with the program instead of one big pause, which is an advantage for realtime applications. 2 minutes is way too long for a frame so I’m guessing it’s triggering a cycle to make sure garbage isn’t building up, which seems important because concurrent GC has lower throughput.

IIRC, quicker suboptimal dev build vs slow optimized release build, former very useful for debugging.

1 Like

Ok, but if we do not accept instabilities where they are convenient or irrelevant for performance, than we would be completely fighting against the language, working in Julia pursuing type stability and inference everywhere, no matter if performance critical or not, is giving up on it being a practical and easy language for the non-critical parts.

Surely to obtain static binaries the easiest is to have a completely type-stable code, but that will give up on dynamism. That is certainly useful, but it will be a add-on to the language (the most important one now, IMO), but it is not it’s raison d’etre.

6 Likes

I try to ignore performance in the beginning unless I have a very clear idea of what the bottlenecks will be (and frequently, when I imagine I have a very clear idea, I turn out to be wrong).

Instead, I try to keep my code clean and modular so that it easy to refactor later based on bottlenecks identified with profiling and benchmarking. Writing idiomatic Julia code gives me reasonable performace 99% of the time with very little effort, so I focus on other things instead.

This was a very interesting dicussion, but for me it mostly highlights how broad the term scientific computing is: it involves all kinds of programs with different constraints and trade-offs, from super-optimized HF trading to one-off scripts for exploratory data analysis, so it is not surprising that people use different languages.

I think it is best to be humble about this, and not assume that the kind of scientific programming one does covers all kinds; extrapolating from personal experience is often unwarranted. I am usually suspicious when people claim that one solution works for a broad range of problems, even if they recommend a tool that I happen to like.

28 Likes

Sure, there is a tradeoff between the desire to write inferrible code and hammering code out quickly. This is sort of the entire dynamic/static language discussion. Clearly, there are advantages to both.

There is an analogy here to code performance. Writing fast code is also harder than writing code where you don’t care about speed.
For both performance and staticness, Julia happens to be designed in such a way that they both can be achieved with minimal effort compared to other languages - though not, as you say, zero effort. There really is a price to pay for writing fast and/or inferrible code.
Staticness and performance is also similar in another regard: Not everyone cares about performance or staticness, and most people only care about them in some situations. However, there is an asymmetry: One cannot write fast code if one depends on a slow dependency, nor can one write a static program if one uses a type unstable dependency. On the other hand, there is no problem hammering out quick and dirty code that depends on a fast and static library.

When writing applications, one can do as one pleases. But what should one do when writing generic libraries that are to be used by other people for use cases that you cannot completely predict?
My take is that one should strive to write fast and inferrible libraries. This way, you give your users opportunity to use your library for both fast and slow code - and for dynamic and static code.

To me, the fact that you can do this at all is the most beautiful part of Julia. Most people agree that smashing the two-language problem is a major selling point of Julia. To me, there is an analogous “static/dynamic two-language problem” which is the major theme of the OP. Julia isn’t there yet, in that it doesn’t provide good static analysis even for inferrible programs, but it’s technically possible for Julia to have a static dialect in a way that it is NOT technically possible for Rust to ever allow scripting, and I really hope Julia eventually allows for this.

17 Likes

I agree with you, with the only remark that I don’t see necessarily that not-inferrable code is “dirty code” always. It is if the code is aimed to provide something to be potentially used from within a performance critical situation. But that’s not always the case.

5 Likes

Couldn’t agree more!

Small tangent on different variants of "Two Language Problems" and its relation with static checks & typing

As I see it, there are a couple of way to interpret the Two Language Problem (TLP) that lead to some misunderstandings in the Julia community (at least in this discourse).

In its naive form, TLP is about the dichotomy “slow & easy code” (for prototyping) vs “performant & hard code”. But “performance” tightly overlaps with while not being equals to “production code”. Thus, there’s a kind of “forking” of TLP toward “production code”. Hence sometimes the misunderstanding (in my view).

With that in mind, I don’t see how static typing could significantly improve Julia’s performances, while it will for sure move the code much closer to production-readiness (… and with a small binary).

That’s why I like the idea you’re mentioning so much. It’s a gently and nice way to gradually move the code toward production, from its inception to its first official release.

But because Julia’s community is statistically biased toward academics, “performance code” matters more that “production code”, which skew the debate to that side (not a problem per se, but to be acknowledged.), in the sense that coding a program for HPC applications does not follow the same constraints (for example).

On the contrary, there’s so much potential new users on the “production” side, that it would be a shame (for Julia’s sake) dismissing developments in that direction.

To link it with the current topic, static typing (and checks) are exactly core contributors of the strength of Rust (allowing for small binaries), and having such possibility for Julia would help competing on Rust’s own terrain.

2 Likes

What would be the difference between what you describe and gradual typing?

I did update the blog post to address some issues.

Undefined behavior

Most importantly, I finally answer why I did use a “straw man example” as described here:

Benny had to highlight this in a very kind way:

I answer it in the following new section in the post about “Preallocation and undefined behavior”:


Correction about having to use a for loop

I did say that you have to write a for loop for maximum performance in Julia but I did mean vectorization. It was unfortunate that I did mention this without enough explanation and directly after talking about Rust’s iterators. This did lead to the confusion that I meant Julia’s iterators.

This has been a problem I did observe 2 years ago. It is also mentioned here and here. But now I did benchmark vectorized and for loop versions and it seems to be fixed? I did remove that now. Sorry.

Sorry for the incorrect benchmarking that I did publish today for some minutes until @DNF pointed out that I did benchmark two different things. I did take the whole website down until the corrected version was uploaded. I have no intention to spread wrong information. I made a mistake and apologize to the whole Julia community :heart:


You have to preallocate in Rust as in any other language and it offers many ways to do so. For example, Vec has the methods with_capacity, reserve, reserve_exact. You could also use extend which will extend from an iterator while preallocating based on the size_hint of the iterator.

Yes, you are right. This was indeed a big point in the conference. We need something similar to rayon but for the GPU.

Yes! I think we have to emphasize this more in this discussion :smiley:


@mbauman @Mason @martin.d.maas @xiaoxi @jakobnissen Thank you very much for your kind words. It means a lot to me :heart:

8 Likes

Your broadcasting function isn’t comparable to the for-loop, since the for loop works in-place.

fdot(x) = @. 3x^2 + 4x + 7x^3 

Instead, try

fdot(x) = @. x = 3x^2 + 4x + 7x^3

(untested)

I’m afraid I must agree with those who say you are playing a bit “fast and loose”…

(Actually, I would recommend this instead

f(x) = 3x^2 + 4x + 7x^3
x .= f.(x)

)

8 Likes

@DNF You are right, I did correct it. See Blog post: Rust vs Julia in scientific computing - #96 by Mo8it

1 Like

Hi @Mo8it. I have to admit that I know almost nothing about Rust, however, I read your blog and the conversation here with a great interest. The language you are advocating for is notoriously being mentioned among the most interesting ones, and I know it is even used for coding Linux. How difficult it is to learn for a non-professional programmer? Is there a support for quantum computing? How would you assess its environment directly related to machine learning?

It is very difficult, very frustrating and time consuming.