Reduce compilation time for long and nested tuples

Details:
I am calling C functions and using C structures from Julia by using the Clang.jl to automate the wrapping process. In the C code, there are multiple arrays of one or more dimensions. In order to have a one-to-one mapping between C and Julia structs, Clang.jl converts C arrays into tuples. Moreover, arrays of multiple dimensions are translated to nested tuples. Here is a toy example:

typedef struct {
    int x[11][22][33];
} s1;

is converted into

struct s1
    x::NTuple{11, NTuple{22, NTuple{33, Cint}}}
end

Problem:
The problem is that the code takes a long time to compile (~6 minutes on the first run and ~0.2 seconds for subsequent ones). After doing some research, I realized that large (and nested?) take a long time to compile as discussed in this issue and this one.
To study this issue, I used SnoopCompile.jl to analyze the inference time and squash type-instabilities using Cthulhu.jl but, after more time that I care to admit, I realized that inference was hardly the bottleneck of the initial run.
I already had a great discussion with the maintainer of Clang.jl and we came up with some solutions/workarounds. See this discussion for more information.

Additionally, I work with (relatively) large tuples of structs so this could contribute to the lengthy compilation process.

Possible solutions:

  1. Precompile: This the most obvious one. However, the code that is slowing me down is in the package I am working on. Therefore, if I precompile and make changes I would have to compile again :frowning:.
  2. Flatten the nested tuples: My theory was that if I create linear tuples instead of nested ones, the compilation time will decrease and I will finally be happy. For example, replace:
struct s1
    x::NTuple{11, NTuple{22, NTuple{33, Cint}}}
end

with

struct s1
    x::NTuple{11 * 22 * 33, Cint}
end

Alas, I only saved 10 seconds out of 6 minutes. Not good. Let’s keep moving.
3. Create and allocate complex structs using C APIs: The idea was suggested by the maintainer of Clang.jl @Gnimuc. I am still working on this one but I am having issue avoiding garbage collection on some of the variable passed to C. Results of this approach are still pending.
4. Create aliases to the structs with arrays: The approach is to create another struct (s1bar for example) with all the arrays converted to pointers. Then, populate s1bar in Julia and pass it to C. Finally, copy the data appropriately from s1bar to s1. s1bar could look like this:

typedef struct {
    int *x; //[11][22][33]
} s1;

Questions:

  1. Is there a workaround in Julia?
  2. Can I profile the compilation process? I want to zone in on the functions that are the longest to compile and tackle them somehow

Workarounds:
One of the workarounds I found in the Julia discourse is to turn off optimizations in Julia: julia -O0. This approach gets the compilation time to ~1 minute and 30 seconds at the cost of runtime slowdown, obviously.

Any help would be much appreciated. Thank you in advance.

Gigantic tuples will take a lot of LLVM time. You’d need to build a system image for that on v1.8. There are some things that will help on Julia v1.9 for precompiling this.

Thank you for the reply.

You’d need to build a system image for that on v1.8.

To properly create a system image, should I instantiate the nested tuples from my code code then create the system image?

There are some things that will help on Julia v1.9 for precompiling this.

I tried compiling the Julia master branch v1.10.0-DEV.96 and run the code but the compilation time was the same. Could the v1.9 branch be different?

Did you snoop the compile?

I am not sure what you mean by “snoop the compile” as I am not too familiar with Julia’s internals. I used SnoopCompile.jl by Tom Holy but this only snoops the inference process as far as I am aware.

Can you please tell me how to snoop the compile or point me to some documentation or tool?