What does jl_init_restored_modules do? Why does loading my package take so long?

Here is the ProfileView flamegraph of using PythonCall, which takes about 4.5 seconds:

As you can see, almost half the time is spent in _include_from_serialized at this line, namely the jl_init_restored_modules C function. What does it do?

From the name, I would assume it has something to do with calling __init__ for any dependent packages. However, I already loaded everything that this package depends on, so the only __init__ needing to be called is for my package - and you can see that happening in the second half of the image.

Does anyone have any insights into what’s happening in the first half of this profile?

It’s loading your package. You can pass C=true to see a bit more what is going on.

What do you mean by “loading your package”? The whole trace here is loading my package.

I tried C=true but I can’t say it was was especially informative (to me).

I guess my real question is: how do I start to diagnose why this long gap appears in the profile, and therefore figure out how to make it faster?

As @kristoffer.carlsson said,

You can pass C=true to see a bit more what is going on.

Also check for invalidations, there are a ton of them. Does PyCall really need some of those overloads or could it use internal machinery that stays private? Who else outside PyCall, for example, really needs to convert a PyObject to a Symbol? That one overload is responsible for about 700 invalidations alone, including some in Pkg which may need recompilation for any downstream packages that depend on PyCall. It’s also invalidating some Base.Dict methods used by Revise, if you have that loaded, and loading PyCall results in those end up being compiled twice. (You can see that from the @snoopi_deep flamegraph.) (nvm, that actually looks like a profile or flamegraph bug.)

1 Like

Also, consider submitting a PR to VersionParsing to precompile vparse(::String) (that shows up in the flamegraph too).

1 Like

Hah, but I was analyzing PyCall rather than PythonCall. Didn’t realize that was a package, very sorry. Still, the general advice is relevant:

  • you have about 670 invalidations, almost all from String(x::PythonCall.Utils.StaticString).
  • a couple of judicious precompilations in both your package and JSON.jl could save a fair amount of time.

Thanks very much for looking Tim. I actually recently removed the JSON dependency entirely, take a look at the master branch.

I already tried SnoopCompile and eliminated all those invalidations just by commenting stuff out, but that had little impact.

According to @time only 10% of time importing the package is spent in compilation. So I’m confused at what the 50% of time is doing before even getting to __init__. Is this a problem even fixable by the sort of advice in SnoopCompile?

Julia can’t currently cache native code, so anything that’s LLVM-related is not currently fixable except via PackageCompiler. Hopefully that will change someday.

It might, though, on your downstream users. It looks like it invalidates a lot of code in Pkg, among many others, and that might make it slower to load the next package. If you can avoid this, great, but if it would be an imposition then the next step might be to see what code in Base and Pkg can be improved in its inferrability.

Absolutely, yeah it’s certainly on my list to fix those invalidations (I think they can all be avoided) it’s just that a quick experiment seemed to imply that they weren’t responsible for the slow package load time.

The package has a lot of ccalls and some @cfunctions, I wonder if they are particularly slow to generate code for.

I think I’m just going to comment out blocks of code to figure out what is responsible.