Help reducing compilation and inference time

Is your code in a package?

Yes

A sysimage and PackageCompiler.jl would be a nightmare for our CI/CD pipelines. That is why I am avoiding them.

I understand you don’t want to wait (but curious, is it a large fraction of the simulation time, how long in total?)

The problem is that it is not only impacting my team but other teams who depend on the results of our simulations. If a co-worker wants to generate some data for a simple test case, they have to wait 6 minutes every time. And yes, due to our complex workflow, it has to be a new Julia session every time.

1 Like

That is not a bad idea. I will try that.

A couple extra thoughts:

You say this is type stable, but you’re passing in a Dict{Symbol,Any}. How are you making this type stable if the input type provides no information (Any) about the actual types being passed in? Are you doing lots of type assertions or hard-coding return types in your code?

It also sounds like compilation is very slow but execution is very fast. Are you doing lots of calculations in type parameters, or using especially large tuples? For calculations in type parameters, this could mean stuff like abusing Val types, or especially large tuples could be things like enormous StaticArrays. These can be very slow to compile and fast on subsequent runs because you’ve essentially pushed your calculations onto the compiler, instead of at runtime.

3 Likes

Inside the get_params_* functions, I am simply populating the param_a struct from params::Dict{Symbol,Any}. I do not do type assertions when getting the values from params. You have a point, that might a problem.

I am going to do the following, let me know if it makes sense:

struct ParamA
   x::Int
   y::Float64
   z::Char
end

get_param_a(d::Dict{Symbol,Any}) = ParamA(d[:x]::Int, d[:y]::Float64, d[:z]::Char)
1 Like

Currently, I am doing to following:

struct ParamA
   x::Int
   y::Float64
   z::Char
end

# No type assertions
get_param_a(d::Dict{Symbol,Any}) = ParamA(d[:x], d[:y], d[:z])
1 Like

No, I think that’s okay. The conversion to structs will handle the type inference part, I wouldn’t worry about adding the extra type assertions

1 Like

@sbuercklin I really appreciate your help! Do you know anyone else in the Julia community who might help me solve this issue?

I don’t want to give up just yet.

can you try to run only run_sim with fake parameters (not taken from the file) and tell us what is the timing result? If that has a low compile time then the problem should be read_params

1 Like

I can try. However, the param_* structs are quite large with nested structs. It will take me sometime to create this experiment.

1 Like

This thread is a good place to ask for help, but without the actual code all we can do is try to guide your debugging abstractly. gdalle covered the typical suggestions (SnoopCompile, Cthulhu, JET) early on, and I think hitting on run_sim have a slow time-to-first-run on its own seems like a good direction to look in.

But there are many things that could be causing compilation timing problems, and it’s not that it’s just a long list, it’s that the list depends on what you’re doing and what else is involved in your code.

It won’t be great from a development perspective, but a last-ditch workaround for users of your code could be to use PackageCompiler.jl to add a precompile workload. This is essentially just running main(some_default_data) or run_sim(some_default_data) the first time you load your package and caching the compiled code. This way, end-users will only need to pay the 4 minute compilation cost once, and subsequent times loading the package will just load the pre-compiled code.

Note, however, that every time you update the project, users will have to rerun the precompilation workload, and this means during development that you won’t be able to take advantage of the precompilation.

2 Likes

I tried it. Not as hard as I thought. Did not work unfortunately.

I agree with @sbuercklin but maybe worth also trying spawning julia with -O0 during development, how much time does it take this way?

2 Likes

Probably also worth taking a look at the great walk-through described at How Julia ODE Solve Compile Time Was Reduced From 30 Seconds to 0.1

4 Likes

-O0 shaves down the compilation time significantly. It now takes ~83 seconds for the first run.
The flag is good for development but we found that adding the -O0 flag and running back-to-back simulations makes the overall runtime slower.

Other than Cthulhu.jl and SnoopCompile.jl, is there a way to debug Julia itself to see exactly which parts of the code take so long to optimize and compile?

Another thing worth noting is that we use the @set macro from Setfield.jl extensively in process_a to set fields in immutable structs and I noticed that @set produces type unstable code.

Unfortunately, we cannot change the struct to mutable since it is mapped to a C struct.

It seems to me that in the link I provided they were able to find the exact sources of the components incurring a lot of the compilation by using

using ProfileView
ProfileView.view(flamegraph(tinf))

after using SnoopCompile.jl, so I think that reproducing what the article does could tell you that.

For @set might be worth updating to Accessors.jl, but I don’t think it will solve your issue. You can try to instantiate new structs manually.

Is your code in a single package? How large is the precompilation .so/.dylib/.dll for that package. (Later I will provide specific instructions for this).

You have stated that “The codebase is quite large.” If this is a large codebase, you should probably should try to separate it into a few packages, each being its own unit of compilation.

Practically, you want the part of your code that is stable and unchanging in one package with a robust use of PrecompileTools.jl to cache the results of the compilation to disk.

The changing part of your code should be in a separate package which will be recompiled often. You might want to be slightly more relaxed about the use of PrecompileTools.jl here.

I highly suspect that you have not used package level precompilation to a significant extent since your main is taking a while to run.

I want to emphasize that the package level precompilation I am discussing above is not the same as using PackageCompiler.jl or generating a custom system image. However, I would would like to hear about how generating a system image breaks your CI and CD workflows. Especially, when your code is being sent to other teams, they probably should be using a system image. Ultimately, the system image the only way to really capture all of compilation and reload it with maximum efficiency.

6 Likes

After some struggle, I was able to generate a flame graph of the code (compilation + execution). Can someone help me make sense of this graph?

The left cluster with a big spike is all Julia internal calls mainly from abstractinterpretation.jl and typeinfer.jl. The right cluster with a smaller spike is where our main function gets called.

Please let me know if you need additional information regarding this graph. I can zoom in on parts of the graph if they do not contain proprietary code.

2 Likes

As I mentioned in the original post, precompiling is a surface level solution to our workflow and does not solve the fundamental problem.

At this point in the code’s lifecycle, fragmentation in unfeasible.

For technical reasons, in our CI/CD pipeline, we cannot have precompiled code stored on disk. Therefore, the code has to be compiled from scratch every time when a CI/CD job is triggered. Adding precompilation directives will simply shift the compilation time to the package loading phase and we will end up with the same total runtime.

With regards to system images, we will need to create a system image every time we submit changes to our simulation which can be dozens of times per day. Creating a system image is notoriously slow and it will further exacerbate the job times. Also, for our code, system images will be large and they will be have to be stored for each change, driving up storage cost.

There must be a way to pinpoint the piece(s) of code that causes Julia to have long compilation times. Then, we can either fix the code or introduce a workaround.