How to cut down compile time when inference is not the problem?

Background

In Gridap.jl, we have experienced a significant increase in compile times when moving from Julia 1.5 to Julia 1.6. To illustrate this, consider this example:

using Gridap
function  main()
  domain = (0,1,0,1,0,1); cells = (3,3,3); k = 2
  γ = 10; h = 1/3
  model = simplexify(CartesianDiscreteModel(domain,cells))
  reffe = ReferenceFE(lagrangian,Float64,k)
  V = TestFESpace(model,reffe,dirichlet_tags="boundary")
  U = TrialFESpace(V)
  Ω = Triangulation(model)
  Γ = BoundaryTriangulation(model)
  Λ = SkeletonTriangulation(model)
  dΩ = Measure(Ω,2*k)
  dΓ = Measure(Γ,2*k)
  dΛ = Measure(Λ,2*k)
  n_Γ = get_normal_vector(Γ)
  n_Λ = get_normal_vector(Λ)
  a(u,v) = ∫( ∇(v)⋅∇(u) )dΩ +
    ∫( (γ/h)*v*u  - v*(n_Γ⋅∇(u)) - (n_Γ⋅∇(v))*u )dΓ +
    ∫( (γ/h)*jump(v*n_Λ)⋅jump(u*n_Λ) -
       jump(v*n_Λ)⋅mean(∇(u)) -
       mean(∇(v))⋅jump(u*n_Λ) )dΛ
  l(v) = ∫( v )dΩ
  op = AffineFEOperator(a,l,U,V)
  uh = solve(op)
end

Calling

@time main()

takes 92.353941 (Julia 1.5) vs 221.752225 (Julia 1.6) in a fresh session. So 2.4x increase. This is a major problem for us since 92 seconds was already a long compilation time.

I believe that Julia (i.e. inference time) is not to blame for the sudden increase. By running this in Julia 1.6:

julia> using SnoopCompile
julia> tinf = @snoopi_deep main()
InferenceTimingNode: 165.115502/226.610818 on InferenceFrameInfo for Core.Compiler.Timings.ROOT() with 1385 direct children

I get that 226.61081 is total compile time and 165.115502 is time in all phases except inference. Thus, inference time alone (61 seconds) does not explain the increase.

Question

Which actions we need to take to cut down the compile time that does not come from inference?

SnoopCompile Has a very nice tutorial on how to cut down inference times, but how to cut other phases?

We are looking forward for help since this is a major issue for us!

Thanks!

2 Likes

Good question and I would also like to see any hints to improvements (although that’s probably a really hard problem). We experienced something similar in our hyperbolic PDE solver framework Trixi.jl, cf. this Discourse thread.

1 Like

One major problem we found was tracked down to being about a map:

https://github.com/SciML/SciMLBase.jl/pull/45

3 Likes

Just testing my patched version of Gridap.jl from here yields

Julia Version 1.5.4
Commit 69fcb5745b (2021-03-11 19:13 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 6
 80.101978 seconds (159.74 M allocations: 7.881 GiB, 3.13% gc time)
SingleFieldFEFunction():
 num_cells: 162
 DomainStyle: ReferenceDomain()
 Triangulation: BodyFittedTriangulation()
 Triangulation id: 17548864358240605896

and

Julia Version 1.7.0-rc3
Commit 3348de4ea6 (2021-11-15 08:22 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 6
 81.666169 seconds (183.65 M allocations: 10.239 GiB, 3.45% gc time, 99.60% compilation time)
SingleFieldFEFunction():
 num_cells: 162
 DomainStyle: ReferenceDomain()
 Triangulation: BodyFittedTriangulation()
 Triangulation id: 4899376070316862521
1 Like

Checking performance with @btime yields

Julia Version 1.5.4
Commit 69fcb5745b (2021-03-11 19:13 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 6
  127.213 ms (1259619 allocations: 143.91 MiB)
SingleFieldFEFunction():
 num_cells: 162
 DomainStyle: ReferenceDomain()
 Triangulation: BodyFittedTriangulation()
 Triangulation id: 14490416472812666571

and

Julia Version 1.7.0-rc3
Commit 3348de4ea6 (2021-11-15 08:22 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 6
  123.524 ms (1275432 allocations: 131.42 MiB)
SingleFieldFEFunction():
 num_cells: 162
 DomainStyle: ReferenceDomain()
 Triangulation: BodyFittedTriangulation()
 Triangulation id: 5320699133180677141