Julia's applicable context is getting narrower over time?

You must have a very peculiar understanding of how programming languages work. FWIW, there is no magic (of any color) in C++ or Julia. It’s just a compiler.

Idiomatic Julia code can generate programs that are as fast as optimized C or C++ (modulo some LLVM quirk). The great thing about Julia is that I can do this with relatively little effort, and still have maintainable code.

No, there is no need for this if you are coding in Julia.


If you want to provide performance optimized code/algorithms as a library for the benefit to other programming environments it is still more natural to use C/C++/Fortran languages.

You wouldn’t develop in Julia when you want to make the high performance library available to other languages.

In this sense, the applicable context is more narrow than for C/C++. This has been so from the beginning, therefor it’s not getting narrower but stays constant. This isn’t valid in general, where the context broadens IMO.

For me this would be the most important improvement for Julia: compiling+linking native executables and shared/static libraries from Julia code.


I’m not sure if that’s true. If it takes 10x longer time to write the high-performance library in C than it would take you in julia I imagine that the extra cost of the development might sometimes be hard to motivate. There are also a number of examples out there in the wild where people have used julia to speed up stuff in R or python, simply because it’s nicer to write julia than C++, or that they wanted to use some other library available in julia.


There have been many questions in the forums about people trying to embed Julia in C/C++. I’ve just assumed that it was for an application, but they could be building libraries. If the bulk of the code can be easily written in Julia then it could make sense to just provide a small C/C++ API layer.

1 Like

I would bet on the contrary. High level languages will progressively take the space of low level ones, as the compilers and interpreters get smarter. That is what is happening with the advent of Julia and with those Numba, Pythtran, etc, tools for python. I think language syntax and expressiveness are and will be more and more the main reasons for chosing any language to develop any project.


A question about C/C++, I haven’t coded these for 20 years or so, and I wonder if the magical speed refers to single-thread CPU code? Are they good for multithreaded as well?

Only if the threads are made from unstable molecules. In general, magic is incompatible with muggle technology.


C (and to a lesser extent C++) is by design is one step up from assembler i.e. WITHOUT any sort of memory or data safeguards. That means you can do tricks that you just can’t do in a “higher” level language. Or you may be able to do but you have to tell the compiler “shut up I know what I’m doing.”

In C do you have a pointer to a structure but want to treat it as a pointer to an array of floats, no problem. Have a pointer to something but know that 57 bytes ahead is a structure of type Foo, not an issue. C trusts that you know what that pointer points at, just tell it and it will let you do it.

Type checking, range checking, bah C doesn’t need to provide any of that, if you want it YOU do it. There are probably other things you can do that violate safety for speed but those are the tricks I (use to) loved to do. Safety checks, avoiding memory corruption, these are things higher level languages give you but there IS a cost. In C/C++ you can avoid that safety and not spend the speed, just make sure you don’t have any bugs and you are good to go.

Threading is just another area where C/C++ doesn’t get in the way with safety checks that may slow you down. The programmer is responsible for knowing how to do multi-threading safely, C/C++ is going to let you do it.

1 Like

Everything you mentioned here (e.g. raw pointer arithmetic) can be done in Julia. Even inserting raw LLVM bytecode. It’s just turned out to be very very rare that you actually need to do such things for performance—judicious use of Julia’s higher-level constructs usually allows the compiler to do what you need for you. (And if you are going to do something unsafe, you can localize it in the tiny portion of your program that actually needs it, while using higher-level operations everywhere else.)

Doing threading efficiently is about a lot more than safety checks. It’s actually quite difficult using “raw” threading primitives (pthreads etc) in C++ to match the performance you get from Cilk (or even OpenMP) in complicated programs, due to the difficulty of load-balancing and synchronizing parallelism effectively—you basically need to re-implement a portion of the threading runtime backend (which is quite a bit of machinery on top of just pthreads). Low-latency work stealing is highly nontrivial to implement from scratch, for example, which is why things like Intel TBB are useful.

Julia’s threading primitives aren’t at the level of Cilk’s performance yet either, but that’s more about the low-level cost of things like queuing new tasks in our relatively young threading runtime scheduler than anything to do with high-level language constructs AFAIK. (My guess is that at some point Julia will join forces with some other threading runtime, e.g. if something akin to Tapir makes it into mainstream LLVM.)


That means you can do tricks that you just can’t do in a “higher” level language. Or you may be able to do but you have to tell the compiler “shut up I know what I’m doing.”

This just isn’t true. Yes, Julia provides you with protections by default, but when the compiler can convince itself they aren’t necessary, they’re silently dropped. Here’s a case where I’ve not told the compiler @inbounds:

julia> function mysum(A)
           s = zero(eltype(A))
           for a in A
               s += a
           return s

julia> A = Float32[1,2,3];

Now check this with @code_native debuginfo=:none mysum(A). It’s not complicated, but to simplify even further here is the loop body:

	vaddss	(%rax,%rdx,4), %xmm0, %xmm0
	incq	%rdx
	cmpq	%rdx, %rcx
	jne	L48

This result has no bounds checking, because Julia’s compiler could prove it’s not necessary. The body of that loop is the same body you’d get from a good C compiler.

In C do you have a pointer to a structure but want to treat it as a pointer to an array of floats, no problem. Have a pointer to something but know that 57 bytes ahead is a structure of type Foo, not an issue. C trusts that you know what that pointer points at, just tell it and it will let you do it.

And so will reinterpret(reshape, Float32, A). (The reshape is new in 1.6 and crucial to getting the same performance you’d get from C.)

Threading is just another area where C/C++ doesn’t get in the way with safety checks that may slow you down.

You could also wrap pthreads and use it directly. Last time I played with that, Julia was still in a state where you had to manually make sure all the code you wanted to run inside threads was already compiled—which indeed is not an issue for C, since it can’t run code interactively and compile on the fly. But even there, there really isn’t an in-principle difference between the two, just a pragmatic one.


Excellent, then why are people saying Julia is slower than C/C++? Is it that Julia’s compiler writers just can’t produce a compiler as good as the C/C++ compiler writers? Or maybe it’s just a cop out for the Julia programmers, we don’t want to spend the time making our code as fast as the C/C++ libraries, so we blame it on the language? Or should we just pass the blame onto LLVM? Maybe it’s a trade off the compilation time would be too long if we did all the C/C++ optimizations so we put a limit on what LLVM does?

I really am at a loss for why people are saying Julia is slower than C/C++ what is holding us back then?

Why do you think Julia is slower than C/C++ I’m not sure that people saying that are correct.

1 Like

Okay, I’m not sure the community agrees with you. I think all the posts I’ve seen where someone complains that C/C++ is faster the general response is “Well yeah, just a little bit and Julia does this, this and this for you!” I don’t see anyone hitting back at these nay sayers and telling them to prove it.

Someone who may have some insights about this discussion is @MikaelSlevinsky, who ended up turning FastTransforms.jl into a C library for performance reasons.

I don’t think it’s so much “can’t” as “don’t have sufficient time and resources”.

There are roughly 3 categories of this (imo). The first is when Julia is solving a different problem (eg an algorithm for UTF8 strings instead of ascii). The second is people bench-marking Julia in such a way that they include compile time. The third is bench-marking Julia against a 30 year old, highly optimized C library (here the answer usually is if you gave us 1/10th the funding and time, we could probably match it).

There are occasional spots where Julia is straight up slower. (some GC related stuff mainly), but this is by far the minority.


Maybe some people are saying that, but others are saying the opposite. Why do you just believe the one?

We’ve explained this several times. There are cases where Julia is slower than the competition and others where it’s the fastest implementation known. Notably, Julia can beat some of the most highly tuned libraries out there (MKL) as well as CSV parsers written in C for other languages that have been optimized like crazy. Naively, you’d guess that parsing would be an area of weakness for Julia but the evidence now says otherwise. In work that I haven’t yet pushed to master, ImageFiltering can beat OpenCV by a factor of 2, and OpenCV is again a crazily optimized C++ library with lots of hand-written vectorization. Julia is as fast or faster than C implementations even under some very demanding circumstances, but that achievement does reflect effort and refinement by the developers of those packages.

When Julia doesn’t match C, it’s probably almost always something that can be overcome. I would guess that most of those cases have nothing to do with unsafe pointers and other C “black magic” and instead have a much more mundane explanation: C dispatches at compile time or by passed pointer, whereas Julia can dispatch at runtime. It takes a little practice to learn how to write code that’s fast even when inference is poor. I personally have never encountered a circumstance where I couldn’t match C. But it’s not a game I obsess about; it’s better to optimize things as much as you need and then move on. If someone else needs even better performance, they can pitch in to help.


Holding us back from what? Taking over the world?


Good enough for jazz
Perhaps a bit off topic, in the Apollo program NASA bought IBM mainframes. They programmed their own OS - to get performance and probably predictable time to solution for mission critical apps.
Today people do program in assembler, to target readout of instruments and real time hardware. But in general we don’t - we use high level languages. Up thread there was a remark about 98% performance. I say when you get to that level it is Good enough for Jazz.
The person in the next office or the nearby startup is going to be cranking their results out and getting on with the science or engineering while you footer with the 2%

footer - Scots word meaning meddle or pass time without accompanying anything meaningful

Good enough for Jazz - instruments do not need to be tuned for concert hall perfection if you’re gonna play some jazz on them


At the risk of derailing this thread further – I have yet to know any Jazz musician that doesn’t take tuning their instrument seriously TBH.