Jeff Bezanson remarks on LLVM "getting slower and slower"

In Ask us anything: Jeff Bezanson and Stefan Karpinski from April 16 (this year, 2020) Jeff Bezanson told that he has only two worries about Julia and one of this is that “LLVM is keep getting slower and slower” (around 8:45). I try to found more information about problems of Julia with LLVM performance (many different parts can be slow). I can’t find any deeper discussion about this, so I put this question hear. I apologize if I just miss relevant topic.

10 Likes

this may interest you: https://news.ycombinator.com/item?id=23137345

11 Likes

Also: https://twitter.com/StefanKarpinski/status/1250491627187195904?s=20

13 Likes

I am not sure if this is feasible, but one could try to compile the LLVM IR with a different JIT in case compiler performance is important:
https://github.com/vnmakarov/mir
MIR seems to have an LLVM IR input and it claims to be about 200 times faster than GCC, so I would think that would also significantly beat LLVM’s performance.

9 Likes

In the long run, the main way to substantially reduce compilation time will surely be a combination of (a) caching compiled code and (b) deferring compilation (by interpreting more non-critical code). Fortunately, neither of those strategies is dependent on LLVM performance.

20 Likes

I thought there is a trade off between compilation speed & execution speed:

LLVM IR: slow compilation, faster execution
M IR: faster compilation, slower execution

Is this a false trade off?

4 Likes

You’d hope so. Sadly, that’s not always reality. Large static arrays is a good example of this, where when they are really big that will spend an enormous time building really terrible compiled instructions.

6 Likes

My thinking is that the 70% runtime speed of the MIR approach in some cases would be a reasonable sacrifice for a 200 times faster compilation. Think of it is as a fast interpreter as per @stevengj 's comment.
Julia is already producing LLVM IR, that can be fed into MIR. I think one has to avoid vector instructions for this to work. The question is how much outside of the LLVM IR that is very LLVM specific in Julia’s compiler and cannot be easily adapted to MIR.

2 Likes

I’d like to see clean evidence that MIR has 200 times faster compilation w/ 70% of the runtime speed of LLVM.

If this is correct, would it be hard for Julia to offer both options?
MIR: when writing/debugging a program, making plots …
LLVM IR: when you need runtime performance

2 Likes

Common Lisp had declarations like (declare optimize (speed 0) (compilation-speed 3) (debug 3) (safety 3)), either at the function level or at the module level. It worked really well. You could run everything with high debug info by default, then profile and optimize the bottlenecks for speed.

Without those declarations, the compiler’s job is so much more difficult. How can Julia know which code is critical for performance, and which isn’t? It’s not even answerable statically.

I wonder if that tradeoff business is not what sunk Gallium.jl.

9 Likes

v1.5 will actually have that feature, and it’s already being used for Plots.jl as far as I know.

The compiler optimization level can now be set per-module using the experimental macro Base.Experimental.@optlevel n . For code that is not performance-critical, setting this to 0 or 1 can provide significant latency improvements (#34896).

23 Likes

Yes and no.

Bold claim: “optimising compilers are operating in a region between a problem that’s not worth solving (most parts of most programs) and a problem they can’t solve (the really performance critical parts)”

I quote it (the claim from writing pointing to Bernstein’s “Death of Optimizing Compilers”) at my new thread: Rethinking optimization, lower ok for all Julia code (for e.g. faster startup) as a default? and originally found link to it at the Benchmark Game.

3 Likes

Cedric @cstjean ,
I think you make excellent points here.

Specifically about " How can Julia know which code is critical for performance, and which isn’t? It’s not even answerable statically. "

JIT compilation >> The application code is initially interpreted, but the JVM monitors which sequences of bytecode are frequently executed and translates them to machine code for direct execution on the hardware.
@@ Just-in-time compilation - Wikipedia

Otherwise A JIT compiler should operate on the Pareto rule
the Pareto principle >>
"… for many outcomes, roughly 80% of consequences come from 20% of the causes (the “vital few”).[1] "
@@ Pareto principle - Wikipedia

Maybe the LLVM 12 will be better.

https://twitter.com/sheredom/status/1356634699402260481

3 Likes

Thx @ImreSamu,

Support for A JIT compiler should operate on the Pareto rule principle
"… for many outcomes, roughly 80% of consequences come from 20% of the causes (the “vital few”).[1] "
@@ Pareto principle - Wikipedia

Google Engineers contributed to LLVM 12 RC1 making
“Machine Function Splitter” code generation optimization
pass for splitting code functions into hot and cold parts.

They are doing this stemming from their research that in
roughly half of code functions that
MORE than 50% of the code bytes are NEVER executed
but generally loaded into the CPU’s data cache.

Google Engineers Propose “Machine Function Splitter” For Faster Performance

4 Likes

Very interesting. I didn’t know that so much compiled code is just a wasted time and resources.

From my experience Julia 1.6 done excellent job in reducing latency, by removing compilation of unused or overoptimized code before it even go to LLVM. At least that is what I understand, but I’m know very little about how internals work. This blog post is good place to start with this topic Analyzing sources…. From the point of view of user situation looks pretty good now, but better LLVM is always welcomed.

2 Likes