What are the best practices for identifying performance bottlenecks in large Julia apps (with large function call stacks)? Genie apps are fairly complex, with a request -> response cycle invoking easily over 100 Genie functions (which, in turn, invoke functions in 3rd party packages and Julia itself).
What’s wrong with
@profile (with ProfileView, StatProfilerHTML or PProf for visualization)?
Can’t speak for
PProf yet, but anything
Profile based is unusable for large call stacks. This is what it looks like for a reasonably complex Genie request (going though router, databases, rendering, web server, etc)
There are a few tools that I use. I definitely use @profile to get a first glance at where the longest parts of my code sit. However, I don’t feel like this tool is always reliable for showing me everything and it’s hard to get where allocations are being generated and why on the output.
One of my other methods to do this is to use simply the @time macro and find the largest time/allocations in the code. Then, I work on reducing them. I generally try everything since a few behaviors in julia seem to be different from other languages. For example, I noticed that in the current version of parallelization it’s better to allocate a vector beforehand instead of assuming the compiler will make a single vector for each thread (it’s the same advice given here: https://docs.julialang.org/en/v1/manual/parallel-computing/index.html).
There are a few other packages like @tracer (Traceur package) that I’ve tried and seemed to be useful, but generally being patient with @time, @allocations, and @profile do very well for me.
Yeah, that’s the Recursive Inference Towers of Doom. Usually some go away if you run the code twice. Otherwise it means that some bits of your code are particularly inference-heavy (you can find out which ones by looking at the bottom of the towers) and you have to type-stabilize them.
Ahhh, sorry, indeed, my profiling script was including JIT compilation, doh!
Much better without it:
ProfileView seems to have stopped working on macOS as
Cairo won’t build.
PProf a try, though I’m not eager to add GoLong as a dependency.
Note that you can click on the links in the bottom with StatProfilerHTML, and it will show you the code with the lines taking the most time, it’s quite easy to find bottlenecks.