Below is a flamegraph from a profiling run I just performed. For context, this is not exactly performance critical code, but it cannot be arbitrarily slow either. I would like to make sure I am reading/interpreting this correctly, specifically in connection with the following comment from the documentation:
Red bars are problematic only when they account for a sizable fraction of the top of a call stack, as only in such cases are they likely to be the source of a significant performance bottleneck
The red indicates type instabilities. My reading of the graph is that there are quite a few, but because they occur quite “low” in the call-stack all the “actual work” is done in a type-stable way. Therefore those type instabilities likely don’t affect the performance too significantly. By contrast if I had lots of red bars near the “top” of that graph, then I would be able to gain a lot by removing those type instabilities.
Do you agree with these statements? If not, I’d be grateful for more comments.
(I’m of course aware that type instabilities are not the only performance problem and there are other things to check too, but this post is just about type instability.) Thank you.
My understanding is that what counts is not really the depth of “red bars” in the call stack. Rather, it’s whether the length of the “red bar” (i.e. the time spent in a function performing dynamic dispatch) is significantly larger than the cumulated lengths of bars stacked immediately upon it (i.e. the time spent in sub calls)
If the length of a “red bar” is completely covered by the lengths of bars stacked on it, it means that most of the time is spent in function calls. However, when a significant fraction of a “red bar” is not covered by any bar on top of it, it might mean that a significant fraction of the time is spent in the dynamic dispatch itself. (it might also mean that the unstable function itself spends some time computing things on its own)
This happens in a few places in your flamegraph. But it does not seem to be to much of an issue. Assuming all of these places are actually 100% multiple dispatch time, then by fixing your code to be type-stable in those places, you could expect the circled lengths to reduce to almost nothing, which would be perhaps a 10-15% gain (by my loose guesstimate)
But also note that in some cases you can’t really know the performance gain from reducing type instabilities from looking at the flame graph. It may be more than the width of top level red in the graph. In high performance situations, removing code and reducing cache use in one area may make another area faster.