Compiler performance regression

Hi,

We are currently developing a Julia package.

Running this demo script (branch “dev”) results in 10x longer compilation in Julia 1.11.3, compared to 1.10.x.

The package relies heavily on automatic differentiation, and one could comment many things about our implementation (you can gloss over the details)

  • suboptimal implementation of corotated reference systems in beam elements
  • ham-fisted use of forward automatic differentiation to compute the Hessian of a scalar function
  • using our homebrew forward automatic differentiation instead of ForwardDiff.jl

but execution performance is not our theme today (our code is work in progress).

Our package is demanding on the compiler, since our forward diff uses StaticArrays, so loops over partial are (I believe) unrolled. We compound that with metaprogramming to generate multiple methods for a function, and calls that generate yet more method-instances. Heck, that’s why we use Julia!

I have two motivations to post

  • Report what I believe to be a compiler performance regression.
  • Learn: how do I go about studying what exactly takes the compiler for a ride, so that I can write code that is not as heavy on the compiler. The one obvious answer is @code_typed, but it’s going to produce some heavy reading. Are there other strategies?

:slight_smile:

Philippe

3 Likes

What profiling have you done to justify that this is a regression in the compiler instead of something else like a particular dependency? This would help find out where you should report the regression.

Hi Benny, What made us supect a compiler regression is that we observe this 10x longer execution time only the first time StaticBeamAnalysis.jl is run. Happens both on Linux x86 and Windows.

In my experience so far, most cases of bad latency is caused by imprecise inference, also called type instability. Make your code type stable, and it’ll usually get better.
To catch type instabilities, I recommend @code_warntype when writing new functions, and JET.jl’s @report_opt for checking whole codebases.

Once you have type-stable code, or as type-stable code as possible, I would:

  • Look for any macros that generate huge amounts of code. Even if the compiler is fast, it’s too easy to write a macro that expands to 100,000 lines of code and tanks the latency
  • If your codebase can be meaningfully precompiled, use PrecompileTools.jl on a workload.
  • Remove unnecessary dependencies
1 Like

Hei Jakob,

We have good control over type stability, so I do not think this is the issue here. But indeed, our code generates hefty amounts of machine code - that’s by design (multiple method-instances, and rolled out loop over elements of static arrays).

Precompiling is absolutely a relevant thing. We have dabbled with this a little, but we have more to do on this.

:slight_smile: