I am accustomed to seeing profiling data in tools like “py-spy” with “Own Time” and “Total Time” views. Julia doesn’t have this (?), so I wrote my own code for it and have made it into a package. The README explains how it works.
I’m seeking feedback and half expecting someone to tell me this was already possible with Profile or some other package.
(On that note, it does look like Profile in the standard library has been adding code to provide a similar view. This was my impression when reviewing Profile’s code on master at least.)
One of my main pain points is that it takes a loooong time to do the “lookups” that turn the pointers from Julia’s profile buffer into StackFrames. It can take several minutes on my laptop. Profile in the standard library doesn’t seem to have this problem and I’ve been meaning to look into why.
Also, I want to say that I’m quite happy with how profiling in Julia works, and pleased that it allowed me to create this library so easily.
Related packages for reference:
On master you can do pretty much the same thing with
Profile.print(format=:flat, sortedby=:overhead), but indeed earlier Julia versions do not have this capability. So this is a nice addition.
You’re correct that instruction pointer lookup is insanely slow. Make sure you’re not doing redundant work, for instance use
Dict(ip=>Profile.lookup(ip) for ip in unique(data)) so that you look up each pointer only once.
Thank you very much for the tip. I was not doing that, but I will. I also never knew about the
unique function, which looks very handy.
Will Profile in the standard library ever provide a way to filter StackFrames like I’ve done? I’d be willing to work on it if you think it’s a good idea?
Your filtering looks nice, and I don’t know of anyone working on that in Profile or anywhere else. As long as one doesn’t break backward compatibility, new features in the standard library are welcome. Since
Profile.print doesn’t currently take a function argument I’d guess there’s a good chance you might be able to pull this off, though you’d have to consider what to do for
tree format printing.
I might also CC @jameson to see if he has any thoughts.
I like the idea of this package, especially filtering the stack frames. I’m having some trouble reproducing the example, however. Profiling and using owntime() (with and without filters) in Juno gives me
 3% => eval(::Module, ::Any) at boot.jl:330
 2% => _fast at reduce.jl:462 [inlined]
 1% => Type at boot.jl:408 [inlined]
 2% => myfunc() at demo_mycode.jl:3
 1% => myfunc() at demo_mycode.jl:2
Should I expect this sort of output?
Maybe. Try looking at your profiling data with other tools as well to see what’s going on. You can still use the tree view in the standard libraries Profile package, for example. That should show you where time is being spent.
I would guess you didn’t “warm up” your function, and that the JIT is using most of the time. Try calling the function several times in a loop and profile that.
I did warm up the function, but I think I figured it out - the Juno REPL behaves slightly differently from the regular REPL. I get more reasonable results if I run it the standard REPL.