Taking TTFX seriously: Can we make common packages faster to load and use

A post was merged into an existing topic: Why isn’t size always inferred to be an Integer?

You actually want to time and/or profile the first run. So start a fresh session, and:

using ProfileView
@profile @time using SomePackage
@profview @time themainfunction(x)

Although profiling slows things down a bit. I also found ProfileView struggles with really huge load times and just @profile can be better. From that you can spot obvious problems, what code and packages actually take time to load, etc.


And also to look at the functions taking inference time:

using SnoopCompile
tinf = @snoopi_deep themainfunction(x)
fg = flamegraph(tinf)


using Cthulhu
@descend themainfunction(x)

And look around to find the unstable (red) bits that looked slow to load in ProfileView, and anything else that seemed like a problem. Im sure there are more sophisticated ways to do this, but it gets you a long way.


I had understood that this can profile the compilation of the profiler as well. Doesn’t it?

I assumed @profile was compiled in the base Julia image.

1 Like

As a developer not very familiar with the subject “what should I write additionally to use my packages faster”, it would be nice if I have just a simple build flag that says “just precompile this package from all of its own tests, and don’t care the excessive code”.


Don’t know how realistic is this, but definitely a good idea.

This is the prof view for FinEtools.

So I think this is telling me that either I get rid of abstract types, or live with the using time of 15 seconds. The big block in the middle is all type inference…

1 Like

Did you try to force precompilation of that piece?

If you are asking do I have __precompile__(true) in the top module, the answer is yes; otherwise, if you mean do I have targeted instructions in the code, the answers no.

Try calling that code at using time and see if it can precompile.

Sorry: which code?

The function the profile is pointing to.

Sorry about the confusion: I did

using ProfileView
@profview @time FinEtools

There is no “main” function per se.

@PetrKryslUCSD that large block isn’t your package code at all, but the @require block in ArrayInterface.jl. If you hover over the red lines near the bottom of the block you will find the requires calls.

This has me worried about my ArrayInterface deps now… we really need to stop using Requires.jl and get precompiled glue packages in Base.


I know. Thanks! Which was really what prompted my question: if I use any abstract types, such as the arrays, there’s nothing I can do?

Abstract types are fine! Your package code takes no time at all.

The problem is you are using something that uses ArrayInterface.jl and also something in its Requires.jl block, so when your package loads the @require macro code also loads, and that has to precompile from scratch.

Probably your LoopVectorisation.jl dependency is causing all of that load time, because it depends on ArrayInterface.jl.

Yep. It’s all just LoopVectorization.jl. Profile for using LoopVectorization :

@Elrod any idea whats going on here? I cant get ArrayInterface/Requires to do this with other packages.


What was it you’re profiling there?
I am aware that LoopVectorization causes TTFX issues in several ways. One of the motivations for rewriting it with a totally different approach.

1 Like

That’s just using LoopVectorization. But I found out about it from using FinEtools, where it appears that the @require code in ArrayInterface is actually taking all that time, if you look through the red lines in the stack of calls you can see that in LoopVectorization as well.

1 Like

I’ll look into this a bit tomorrow. Looks like there should be some low hanging fruit.


so, based on the discussion here, I added an example calculation after the module in my package main file. It precompiles fine, but then segfaults with something about deserialization when I try to use it. Am I doing something wrong or is this a julia bug that I should report (this is on both 1.7 and nightly)? I’m not using __precompile__(true) as the docstring says it’s the default?