You actually want to time and/or profile the first run. So start a fresh session, and:
using ProfileView
@profile @time using SomePackage
ProfileView.view()
@profview @time themainfunction(x)
Although profiling slows things down a bit. I also found ProfileView struggles with really huge load times and just @profile can be better. From that you can spot obvious problems, what code and packages actually take time to load, etc.
Edit:
And also to look at the functions taking inference time:
using SnoopCompile
tinf = @snoopi_deep themainfunction(x)
fg = flamegraph(tinf)
ProfileView.view(fg)
Then:
using Cthulhu
@descend themainfunction(x)
And look around to find the unstable (red) bits that looked slow to load in ProfileView, and anything else that seemed like a problem. Im sure there are more sophisticated ways to do this, but it gets you a long way.
As a developer not very familiar with the subject “what should I write additionally to use my packages faster”, it would be nice if I have just a simple build flag that says “just precompile this package from all of its own tests, and don’t care the excessive code”.
So I think this is telling me that either I get rid of abstract types, or live with the using time of 15 seconds. The big block in the middle is all type inference…
If you are asking do I have __precompile__(true) in the top module, the answer is yes; otherwise, if you mean do I have targeted instructions in the code, the answers no.
@PetrKryslUCSD that large block isn’t your package code at all, but the @require block in ArrayInterface.jl. If you hover over the red lines near the bottom of the block you will find the requires calls.
This has me worried about my ArrayInterface deps now… we really need to stop using Requires.jl and get precompiled glue packages in Base.
Abstract types are fine! Your package code takes no time at all.
The problem is you are using something that uses ArrayInterface.jl and also something in its Requires.jl block, so when your package loads the @require macro code also loads, and that has to precompile from scratch.
Probably your LoopVectorisation.jl dependency is causing all of that load time, because it depends on ArrayInterface.jl.
Yep. It’s all just LoopVectorization.jl. Profile for using LoopVectorization :
What was it you’re profiling there?
I am aware that LoopVectorization causes TTFX issues in several ways. One of the motivations for rewriting it with a totally different approach.
That’s just using LoopVectorization. But I found out about it from using FinEtools, where it appears that the @require code in ArrayInterface is actually taking all that time, if you look through the red lines in the stack of calls you can see that in LoopVectorization as well.
so, based on the discussion here, I added an example calculation after the module in my package main file. It precompiles fine, but then segfaults with something about deserialization when I try to use it. Am I doing something wrong or is this a julia bug that I should report (this is on both 1.7 and nightly)? I’m not using __precompile__(true) as the docstring says it’s the default?