When Julia gets within 1-3x of C/C++ speed, why is C/C++ usually faster?

Benny · October 30, 2020, 10:51pm

A notion that I’m getting from blogs and published papers comparing Julia with other languages in practical use cases is that Julia gets within 1-3x of the speed of the fastest implementation in C/C++ (though it’s worth mentioning that Julia is invariably more readable and does rarely beat C/C++). To be clear, I’m talking specifically about writers who made the effort to read the performance tips and do the things the Julia way: type stability, concrete fields, limited allocations. If they didn’t, they’d get a slowdown of several orders of magnitude, as many other topics here can demonstrate.

It seems that we can already give the compiler the sort of information that people broadly say is what makes compiled languages like C/C++ more efficient. The only big difference I can think of is Julia’s garbage collector, but I can’t say how that factors into anything because I don’t know any low-level languages. I’m hoping someone who does can give general reasons for the remaining bit of difference in performance.

Mason · October 30, 2020, 11:08pm

It’s very hard to speak about this in generalities because every individual little problem is actually a neverending rabbit hole of potential microoptimizations, special cases, and weird details you would never have guessed.

However, in general I would say that in my experience (as someone who reads a fair amount about this, but doesn’t use or know C) the two main reasons are

It’s awkward. You can basically write C flavoured julia code if you really want to, it’s just ugly and awkward. It feels like cheating if the goal is to compare julia to C, but then to write horrific, unidiomatic unsafe julia code that is basically just C.
Missing optimizations: There are just some optimizations that are possible in julia but not yet implemented because it’s hard or nobody has gotten around to it yet. Many of these things are missing optimizations in LLVM, or things that are awkward for us to communicate properly to LLVM. A great example of this would be vectorization and SIMD. It turns out you can squeeze some pretty insane performance out of julia loops if you use https://github.com/chriselrod/LoopVectorization.jl and you can often completely smoke all but the most clever, handwritten custom assembly iteration schemes. This basically happens by bypassing LLVM’s looping stuff and getting Chris Elrod to do your code generation instead.

yuyichao · October 31, 2020, 12:04am

If it’s within a factor of 3 than GC is no an issue.

daniel · October 31, 2020, 1:50am

the wording of that last sentence made me chuckle

Mason · October 31, 2020, 1:56am

Only the finest handmade artisanal SIMD code from @Elrod.

JeffreySarnoff · October 31, 2020, 2:01am

With substantial software, when Julia gets within 1-3x of C/C++ speed, why has Julia obtained working, reliable, collaboratively written code 2-5x faster?

Oscar_Smith · October 31, 2020, 2:18am

I think the biggest reason for this is that to get the power of multiple dispatch, you would need to write all your code using C++ templates. Doing so to the extent Julia does would absolutely kill your compile times due to the lack of a JIT. Julia’s macros are also a huge part of the story here. Tools like LoopVectorization mean that idiots like me can write the equivalent to hand optimized assembly for any loop that is even a vague hotspot.

Yifan_Liu · October 31, 2020, 3:07am

Are there any plans to integrate LoopVectorization into Julia?

Oscar_Smith · October 31, 2020, 3:11am

Not in the short term. LoopVectorization is improving very quickly, and incorporating it into base would almost certainly massively slow progress. Furthermore, there is not a ton of benefit of adding it to Base.

KZiemian · November 2, 2020, 1:27am

I don’t know if this will help you, but Jeff Bazanson talks about Julica vs C, C++ speed around 16:45 of State of Julia Jeff Bezanson & Stefan Karpinski. Whole presentation is worth to hear for so many reasons.

Benny · November 2, 2020, 4:53am

I read the v1.5 release highlights, but I didn’t think the allocation optimization would apply to how tuples would be stored. Just to clarify, am I correct in thinking that particular part of the video is showing that while v1.4 allocated a million tuples and an array that points to them, v1.5 allocated just the array that stored the tuples directly?

Oscar_Smith · November 2, 2020, 4:58am

Exactly.

Topic		Replies	Views
Are there any fundamental parts of Julia's design or implementation that could limit its performance? Performance	6	845	June 13, 2019
A bet: what specific algorithms in Julia can be faster or as fast as C++ implementations? Performance	13	2730	August 16, 2018
What makes Julia dynamic (easy) and as fast as C? New to Julia	4	4118	December 18, 2020
Julia and C or just standalone C++ Performance question	6	784	January 22, 2021
Intel C/C++ compiler performance versus Julia Offtopic	20	6231	August 11, 2021

When Julia gets within 1-3x of C/C++ speed, why is C/C++ usually faster?

Related topics