As you can see, almost half the time is spent in _include_from_serialized at this line, namely the jl_init_restored_modules C function. What does it do?
From the name, I would assume it has something to do with calling __init__ for any dependent packages. However, I already loaded everything that this package depends on, so the only __init__ needing to be called is for my package - and you can see that happening in the second half of the image.
Does anyone have any insights into what’s happening in the first half of this profile?
Also check for invalidations, there are a ton of them. Does PyCall really need some of those overloads or could it use internal machinery that stays private? Who else outside PyCall, for example, really needs to convert a PyObject to a Symbol? That one overload is responsible for about 700 invalidations alone, including some in Pkg which may need recompilation for any downstream packages that depend on PyCall. It’s also invalidating some Base.Dict methods used by Revise, if you have that loaded, and loading PyCall results in those end up being compiled twice. (You can see that from the @snoopi_deep flamegraph.) (nvm, that actually looks like a profile or flamegraph bug.)
Thanks very much for looking Tim. I actually recently removed the JSON dependency entirely, take a look at the master branch.
I already tried SnoopCompile and eliminated all those invalidations just by commenting stuff out, but that had little impact.
According to @time only 10% of time importing the package is spent in compilation. So I’m confused at what the 50% of time is doing before even getting to __init__. Is this a problem even fixable by the sort of advice in SnoopCompile?
Julia can’t currently cache native code, so anything that’s LLVM-related is not currently fixable except via PackageCompiler. Hopefully that will change someday.
It might, though, on your downstream users. It looks like it invalidates a lot of code in Pkg, among many others, and that might make it slower to load the next package. If you can avoid this, great, but if it would be an imposition then the next step might be to see what code in Base and Pkg can be improved in its inferrability.
Absolutely, yeah it’s certainly on my list to fix those invalidations (I think they can all be avoided) it’s just that a quick experiment seemed to imply that they weren’t responsible for the slow package load time.
The package has a lot of ccalls and some @cfunctions, I wonder if they are particularly slow to generate code for.
I think I’m just going to comment out blocks of code to figure out what is responsible.