precompile when the function has side effects, like opening a window. Maybe we should also design packages so that most functions dont have side effects, like in many functional languages. To make precompilayion easier.
It tells you something… it sounds like the time is in some async tasks? That does make things a little harder. But we would really need to see the flame graph to know. What’s the package?
The package is here, but it’s in explorative state and the code quality is so bad that I’m a bit ashamed to share (type instability etc). The flame graph looks like this:
The long flat task.jl on the right looks suspicious, but I don’t know where it comes from.
the right part comes from the (unused) threads. Should go away with a single thread.
Yes, having a simple checklist for package authors would be a good idea. What to think about when writing the code, and what to do as a “compile time optimization pass” after writing the code.
Maybe I was too pessimistic. My impression after having tried to use SnoopCompile on a few of my own packages is that reducing compilation time requires you to know all kinds of implementation details like backedges, and how precompilation actually works. And that you, beside having to know all this stuff, then need to carefully go through and analyze first invalidations, then time and analyze inference, and lastly go through your package to find all APIs and exercise them in precompile statements.
But maybe that was a wrong impression, and it’s actually possible to learn a series of quick checks/tweaks that can be done in say, 30 minutes for a small package. Such a checklist would be worth gold.
Maybe also copy some steps in from this checklist in SnoopCompile: SnoopCompile.jl · SnoopCompile
How does one check the effect of precompilation on TTFX? Just time
using with and without?
@PetrKryslUCSD im using TTFX like time-to-first-plot. The time to load a package + the time to do its main thing X that it does, using
My PR above made blinks load in 1s, but
Window() takes 15 so its still not a huge gain.
Yeah SnoopCompile is more complicated. I mostly just use ProfileView.jl and Cthulhu.jl and give up after that. Anything really bad is pretty obvious in ProfileView. But I would also like to know how to use SnoopCompile better.
So, inspired by this thread I decided to ProfileView the TTFX for my package, DFTK, which currently stands at a ridiculous 90s. About 1/5 of that time is in inference, without reference to package code. Some places I can recognize are functions that have nested functions, but that is a pattern I use a lot and I’m not sure how to do without (and not sure if that’s even the problem). A lot of the advice I see for reducing TTFP are about method redefinitions, but I don’t think I redefine many base functions. I’d appreciate any advice.
I wander if the compiler could read assertions, though, and use that information.
Question more on the subject: Can someone provide a clear step by step of what they are doing to produce these flamegraphs and benchmarking the TTFX? It is not clear to me how to do that, given that the first execution of the profile has to be discarded, and on the second run well, it is not anymore the first run (at least the one from VSCode these are the instructions).
So let it also solve this issue
You actually want to time and/or profile the first run. So start a fresh session, and:
@profile @time using SomePackage
@profview @time themainfunction(x)
Although profiling slows things down a bit. I also found ProfileView struggles with really huge load times and just
@profile can be better. From that you can spot obvious problems, what code and packages actually take time to load, etc.
And also to look at the functions taking inference time:
tinf = @snoopi_deep themainfunction(x)
fg = flamegraph(tinf)
And look around to find the unstable (red) bits that looked slow to load in ProfileView, and anything else that seemed like a problem. Im sure there are more sophisticated ways to do this, but it gets you a long way.
I had understood that this can profile the compilation of the profiler as well. Doesn’t it?
@profile was compiled in the base Julia image.
As a developer not very familiar with the subject “what should I write additionally to use my packages faster”, it would be nice if I have just a simple build flag that says “just precompile this package from all of its own tests, and don’t care the excessive code”.
Don’t know how realistic is this, but definitely a good idea.
This is the prof view for FinEtools.
So I think this is telling me that either I get rid of abstract types, or live with the
time of 15 seconds. The big block in the middle is all type inference…
Did you try to force precompilation of that piece?
If you are asking do I have
__precompile__(true) in the top module, the answer is yes; otherwise, if you mean do I have targeted instructions in the code, the answers no.