Compiler Performance

tobias.knopp · May 31, 2018, 8:15pm

Don’t know if this is the correct category but maybe:

Since Julia 0.6 has been released I have seen major performance issues with Julia. After analyzing the issue a little bit I found out that it seems the compile performance that has regressed a lot. It seems that during the transition from 0.5 to 0.6 this has not been observed and I cannot really find deeper discussions on this subject.

I now have some examples the regressed and I wonder if there is some “compiler benchmark suite” where the compiler performance of 0.5 vs 0.6 vs 0.7 is systematically compared? If yes, I could put some examples into such a suite, which I am currently collecting during analysis of my code. Here is a simple example:
Julia 0.5.2:

julia> u=[1]
1-element Array{Int64,1}:
 1

julia> @time u.== 1
  0.023830 seconds (8.52 k allocations: 371.236 KB)
1-element BitArray{1}:
 true

Julia 0.6.3:

julia> u=[1]
1-element Array{Int64,1}:
 1

julia> @time u.== 1
  2.614611 seconds (835.32 k allocations: 44.224 MiB, 0.66% gc time)
1-element BitArray{1}:
 true

This is a 100 fold regression in compile performance.

foobar_lv2 · May 31, 2018, 8:39pm

FWIW, I get

julia> u=[1]
1-element Array{Int64,1}:
 1

julia> @time u.== 1
  1.455042 seconds (165.26 k allocations: 8.545 MiB)
1-element BitArray{1}:
 true

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40 (2017-12-13 18:08 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-5###U CPU @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas
  LIBM: libm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)

and

julia> u=[1]
1-element Array{Int64,1}:
 1

julia> @time u.== 1
  0.045010 seconds (16.64 k allocations: 996.639 KiB)
1-element BitArray{1}:
 true

julia> versioninfo()
Julia Version 0.7.0-alpha.0
Commit 22590d529d* (2018-05-31 00:07 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i5-5###U CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)

But I don’t know whether this is because of changed makefiles for the sysimg or genuine compile-speed updates.

tobias.knopp · May 31, 2018, 9:02pm

I have not tested on 0.7 so it would be great if that operation got faster again. But still, I am wondering if there are systematic tools to check compiler performance / regressions.

kristoffer.carlsson · May 31, 2018, 9:16pm

Should run these experiments with precompile.jl disabled. ~~It is fast on 0.7 because I ran @time when generating the file this time~~. Or perhaps the new broadcasting API is better covered in precompile.jl now…

tobias.knopp · May 31, 2018, 9:25pm

yea indeed it can be different things. But that implies that we need tooling for finding regressions (either in compile time or in precompile.jl statements).

jeff.bezanson · May 31, 2018, 9:26pm

I do think that, overall, compiler performance has regressed, and we badly need to work on it. Partly LLVM has been getting slower over time, but whatever the cause we need to address it. 100x is certainly unusual though, and I would also guess that precompile.jl is involved because of that.

kristoffer.carlsson · May 31, 2018, 9:32pm

Yes, very much. We are very carful about runtime performance (with Nanosoldier) but have nothing really running for compile times. Adding a test group for compile times has been on my mind for quite some time but haven’t gotten time to do it nor am I sure exactly how it is best measured and if it is possible to avoid the overhead of restarting julia for each measurement.

tobias.knopp · May 31, 2018, 9:37pm

Its great that a future focus will be on compiler performance but I wonder the following: Which pass of compiling is so slow? LLVM would be the optimizer and codegen right?. But when I profile this I see just calls in inference. Doesn’t that mean that this might be a performance issue during inference? Am no compiler expert but this would be different corners, no?

mbauman · May 31, 2018, 9:59pm

The new Broadcasting API is indeed much more precompile-friendly as it no longer creates new anonymous functions for each dot-broadcast expression. Creating new anonymous functions is a pretty bad case requiring lots of recompilation.

I don’t believe our profiler is set up to track time spent in C++/LLVM. We’ve been calling Keno’s new pure-Julia IR passes the “optimizer” — this is then what gets passed off to C++ to generate LLVM IR, and then LLVM has a number of optimization passes that it runs.

StefanKarpinski · May 31, 2018, 10:02pm

Profile.print takes a C = true keyword which will display C stack frames as well. Not sure how well that will work on a non-debug build though.

tobias.knopp · May 31, 2018, 10:06pm

This was a good suggestion. Now I can see in the leafs of the deep inference calls many many calls into libLLVM.

ExpandingMan · June 1, 2018, 5:17pm

I haven’t tested it, but I’m getting the feeling that on 0.7 the initial pre-compilation that happens with using is waaaay slower but after that a lot of things seem to actually be a little bit faster. Am I perceiving this correctly? What changed, it’s just pre-compiling more stuff?

Elrod · June 2, 2018, 6:14am

The most striking thing I noticed is that I have a package that takes about 100s to compile (ie, first run) on Julia 0.6, but only 10-15s on Julia 0.7.

zhangliye · June 2, 2018, 7:23am

julia> u=[1]
1-element Array{Int64,1}:
 1

julia> @time u.== 1
  2.463846 seconds (165.28 k allocations: 8.558 MiB)
1-element BitArray{1}:
 true

Julia0.62, window 7, 64 bit, Intel i5-3320M 2.60GH

Compared with Python, I feel the compiling time harms the benefits/convenience of the elegance of the language, especially when run the code again and again for the debugging purpose.

Tamas_Papp · June 2, 2018, 2:07pm

I get similar results in v0.6.3, but note that in current master,

julia> @time u.==1
  0.275278 seconds (642.14 k allocations: 32.539 MiB, 6.35% gc time)

so things will improve soon for everyone.

Ralph_Smith · June 2, 2018, 2:24pm

Here’s alpha:

               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-alpha.0 (2018-05-31 00:07 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-pc-linux-gnu

julia-0.7> u=[1];

> @time u .== 1
  0.066827 seconds (16.64 k allocations: 997.857 KiB)

So what’s wrong with master?

pablosanjose · June 2, 2018, 4:44pm

Haha! Marvellous!

Tamas_Papp · June 2, 2018, 4:48pm

It was some kind of a quirk I cannot reproduce, now it is approximately as fast as your benchmark.

pablosanjose · June 2, 2018, 4:53pm

Has anybody encountered inconsistent allocation/timings in recent masters? I have been puzzled by identical codes allocating vastly different memory when run in two different, newly-launched REPL sessions. I’ve not managed to pin down the problem or a MWE, but your comment @Tamas_Papp sounded familiar…

Tamas_Papp · June 2, 2018, 5:04pm

I think I was running multiple MCMC chains on that machine when I did the benchmark, so it probably was my fault.

Topic		Replies	Views
Is compile time significantly slower in 0.6? Internals & Design	17	3109	April 6, 2017
Shouldn't 1.8.0 be faster than Julia 1.7? Performance	30	2544	September 16, 2022
PrecompileTools: precompilation causing allocations in Julia 1.6.7 General Usage precompilation	9	572	August 22, 2023
Bad performance for dispatch-heavy code Performance question	12	1688	September 20, 2019
Potential performance regressions in Julia 1.8 for special un-precompiled type dispatches and how to fix them Performance performance	25	3452	October 4, 2022

Compiler Performance

Related topics