Sorry I don’t get it. The flame graph is showing all the problems. You are currently paying a huge price for an unused logging call.
It will show all the allocation problems too with colors.
Like really I mean everything will be in the flame graph - I’m trying to help you to help yourself. If you don’t have enough detail, run it for longer. Its a sampling profile so you need to get enough samples to build a detailed graph.
(and yes, not allocating is always a central goal in performance tuning, but avoiding type instability is too in juila. In contrast function calls are very low on the list of performance concerns)
Also, see: