Minimizing long compile time/Precompiling

I have some code for a research project of mine that takes quite a long time to compile. Here is a recent example

@time generateMnli(Pk, bd22, 0.0)
@time generateMnli(Pk, bd22, 0.0)
434.282438 seconds (502.49 M allocations: 23.516 GiB, 3.86% gc time)
0.393256 seconds (990.98 k allocations: 57.412 MiB, 4.19% gc time)

The code is quite lengthy and a longish compilation time is not unreasonable, however the current situation makes the code very difficult for me to recommend other people use it. I would like to either significantly reduce the compilation time or figure out some solution to only pay this compile cost once (like when a package is first installed).
The first bit of the code looks like

function generateMnli(Pk::Function, BiasDict22::Dict{String,F}, f::Real) where {F<:Real} 
    M = OffsetArray{Array{Float64}}(undef, 0:4, 0:8, 0:8)
    
    b1 = BiasDict22["b1"]
    bη = BiasDict22["bη"]
    b2 = BiasDict22["b2"]
    bK² = BiasDict22["bK2"]
    bδη = BiasDict22["bδη"]
    bη² = BiasDict22["bη2"]
    bKKpara = BiasDict22["bKK∥"]
    bΠ2para = BiasDict22["bΠ2∥"]
    r, xi20 = ξ(Pk,2,0)
    _, xi00 = ξ(Pk,0,0)
    _, xi1m1 = ξ(Pk,1,-1)

where ξ is a wrapper around a function from another package that returns two arrays of Float64 (so each variable like xi00 is a length 1024 or so array). This wrapper function is used in quite a few other places and those don’t run into compile problems. The reason that this function takes so long to compile is due to statements like the following

M[0,0,2] = @. (32*(3*f*bη - 5*bΠ2para)*(147*b1*f + 182*bKKpara + 147*f*bδη + 6*f^2*bη + 6*f^2*bη² +
            84*bΠ2para)*xi20^2)/972405. - (32*(3*f*bη - 5*bΠ2para)*(7*bKKpara + 4*f^2*bη + 4*f^2*bη² +
            7*bΠ2para)*xi40^2)/108045. + xi20*((32*(7*bKKpara + 3*(14*b1*f + 14*f*bδη - 8*f^2*bη - 8*f^2*bη² -
                    7*bΠ2para))*(3*f*bη - 5*bΠ2para)*xi00)/99225. + (32*(28*bKKpara - 3*(49*b1*f + 49*f*bδη - 38*f^2*bη -
                    38*f^2*bη² - 42*bΠ2para))*(3*f*bη - 5*bΠ2para)*xi40)/540225.)

The above is one of the shorter ones and there are around 400 lines of such statements in the function. Of course this is probably beyond any reasonable use case so its understandable that the compiler is having issues but I’m unsure what to do to even find out what I can change to help it compile faster. There are various things I could do, take the dictionary values as arguments instead of the dictionary, calculate the ξ functions outside the function and take them as inputs, generate the Pk function inside of this function to prevent having a function as an input, split each term up into its own function, but I’m unsure which of these have even a chance of working.

Below is the result of @code_warntype generateMnli(Pk, bd22, 0.0) on a version reduced to essentially what is shown in this post.
https://pastebin.com/a77zu0ve

An additional area that would resolve my issues is being able to pay the compilation cost once and only once. I have tried adding various precompile calls, but none of them seem to work. I have briefly looked at PackageCompiler.jl and this seems like it would resolve the issue locally, but ideally other people would be able to just download my package and not need to setup a system image just for this code. Is something like precompile usable for this case or should I just focus my time on reducing compilation time?

Any help would be greatly appreciated, even just ways to check the compilation process and find which lines are causing the issue (I think code_warntype might do this but I have no idea how to read the output).

It is hard to suggest something specific without an MWE. I would break up the code to smaller functions, and use NamedTuples or composite types (struct) to group arguments instead of Dict, see also

Also, in the expressions you include many terms are repeated, eg 3*f*bη and xi40^2 and various combinations of these. Calculating and reusing common subexpressions could help. If this is code generated by some other tool, this would require understanding the structure better.

4 Likes

The issue with building a MWE is that there are a lot of complicated dependencies in the code so even if I section off a bit like above, getting the xi functions (which could be an important part of compile time) would be too involved.

I will definitely try breaking it up into smaller functions and messing with various alternatives to Dict once I have a little time. I really appreciate you pointing out Parameters.jl, this is exactly what I was using the Dict for and seems much easier to use.

The issue with doing something like calculating some shared terms is just that the expressions are too long and there are far too many terms. Going through and really getting any performance gain from that would be quite time consuming, unless you think that would help the compile time significantly? I will probably eventually cache the array*array operations though, as I somehow never thought to do that.

If you can turn it into a for loop (which means getting all the indices correct), you could use CommonSubexpressions.jl.
But if each of these variables is either a scalar or an array of scalars, I bet the compiler would handle CSE automatically if you put @inbounds @fastmath in front of the loops.
Although if you have a lot of these giant expressions, and don’t remember the correct indices and shapes of each variable, converting to loops would be tedious.