Taking TTFX seriously: Can we make common packages faster to load and use

Raf · January 20, 2022, 5:16pm

So much work has been done to improve julia compilation and remove invalidations. Things really feel a lot snappier to use.

But there is still the problem of packages that were written without much thought to startup time, and TTFX - the time-to-first whatever a package does most, is sometimes really slow.

Some common problems I see are (reorganised to be in order of importance):

Lack of type stability slowing down compilation.
Not calling precompile for common methods, or just running them if that is possible (see Chrisses comment below).
Using Requires.jl so there is no code precompilation for the required code.
Not checking or fixing compilation time of dependencies

Or simply that TTFX profiling has never been done, or was difficult in the specific context.

I have been as guilty of these things as anyone else, and I’m slowly trying to work on it in my own packages. But it seems there are problems in some widely used packages, and fixing these could really improve the experience of using Julia. We could all probably learn a bit more about how people are improving load times across the ecosystem.

An example: Blink.jl and `Window()`

I’m not singling anyone out, Blink is super useful. It’s just something I happen to use for a few things and would really like if it loaded quickly.

Blink.jl seems to be the default container for a Julia desktop web app. But it’s really slow to get started. using Blink takes 5 seconds to load on my laptop, and loading an empty window with Blink.Window() takes 11-15 seconds to finish the first time!! In comparison, vscode only takes 2 seconds to load in total, also using Electron.

https://github.com/JuliaGizmos/Blink.jl/issues/288

It turns out startup time is largely due to Requires.jl blocks in WebIO.jl. It seems to be precompilation and JSON parsing in Blink.Window().

Moving code out of requires in WebIO.jl gets the blink load time down to 1 second! Yes, a 5x improvement just from that.

But Blink.Window() is harder, a lot of compilation in HTTP.jl and Mux.jl, slow JSON parsing in JSON.parse, and various other things I’m not across. Maybe JSON3.jl would be faster? Maybe we can precompile all of this somehow? Maybe some profiling work on HTTP.jl would help? I’m not sure, but probably someone here has some ideas.

Can we do this collectively?

It could be faster and more fun to make a collective effort of speeding up startup time for common packages, also making the tricks more widely known. A bit like the performance workshopping we see here - can we get competitive about how fast we can make a package load and perform its most common operation?

We could also get together a list of packages that we all use that could be faster? and make it happen? or just have a common thread here like “TTFX: Blink.jl Window()”.

Thoughts on this generally? Or ideas for Blink?

Raf · January 20, 2022, 11:31pm

PR to WebIO fixing the startup time to Blink.jl:

https://github.com/JuliaGizmos/WebIO.jl/pull/478

Before

After

using Blink is 1 second. But Window() is still really slow. Would be great if that was a few seconds.

ChrisRackauckas · January 21, 2022, 2:26am

If anyone hasn’t read the issue https://github.com/SciML/DifferentialEquations.jl/issues/786 I would recommend doing so. We worked through and documented a lot of starting time issues in DifferentialEquations.jl and go things about an order of magnitude better.

That seems to be overblown. The right way to profile it exists now:

github.com/JuliaPackaging/Requires.jl

Add debug log for monitoring `@require` time

JuliaPackaging:master ← IanButterworth:ib/debug_load

opened 03:32PM - 11 Jan 22 UTC

IanButterworth

+8 -4

Adds a debug log so you can monitor `@require`s that get triggered and the time …spent running code ```julia julia> ENV["JULIA_DEBUG"]="Requires" "Requires" julia> using Foo ┌ Debug: Requires conditionally ran code in 0.917808094 seconds: `Ratios` detected `FixedPointNumbers` └ @ Requires ~/.julia/packages/Ratios/xLeZh/src/Ratios.jl:123 ┌ Debug: Requires conditionally ran code in 0.531145239 seconds: `ArrayInterface` detected `SuiteSparse` └ @ Requires ~/.julia/packages/ArrayInterface/mI7ab/src/ArrayInterface.jl:675 ┌ Debug: Requires conditionally ran code in 0.002734456 seconds: `ArrayInterface` detected `Adapt` └ @ Requires ~/.julia/packages/ArrayInterface/mI7ab/src/ArrayInterface.jl:760 ┌ Debug: Requires conditionally ran code in 0.029623663 seconds: `ArrayInterface` detected `StaticArrays` └ @ Requires ~/.julia/packages/ArrayInterface/mI7ab/src/ArrayInterface.jl:690 ┌ Debug: Requires conditionally ran code in 0.006599821 seconds: `ArrayInterface` detected `OffsetArrays` └ @ Requires ~/.julia/packages/ArrayInterface/mI7ab/src/ArrayInterface.jl:1116 ┌ Debug: Requires conditionally ran code in 0.022674606 seconds: `HDF5` detected `FileIO` └ @ Requires ~/.julia/packages/HDF5/pIJra/src/HDF5.jl:2133 ┌ Debug: Requires conditionally ran code in 0.002288416 seconds: `RandomNumbers` detected `Random123` └ @ Requires ~/.julia/packages/RandomNumbers/3pD1N/src/RandomNumbers.jl:38 ┌ Debug: Requires conditionally ran code in 1.708197842 seconds: `CUDA` detected `SpecialFunctions` └ @ Requires ~/.julia/packages/CUDA/sCev8/src/initialization.jl:35 ┌ Debug: Requires conditionally ran code in 0.0126559 seconds: `ArrayInterface` detected `Adapt` └ @ Requires ~/.julia/packages/ArrayInterface/mI7ab/src/ArrayInterface.jl:796 ┌ Debug: Requires conditionally ran code in 0.029985851 seconds: `ArrayInterface` detected `CUDA` └ @ Requires ~/.julia/packages/ArrayInterface/mI7ab/src/ArrayInterface.jl:795 ┌ Debug: Requires conditionally ran code in 2.06503522 seconds: `Zygote` detected `CUDA` └ @ Requires ~/.julia/packages/Zygote/umM0L/src/lib/broadcast.jl:251 ┌ Debug: Requires conditionally ran code in 0.211960078 seconds: `Zygote` detected `Distances` └ @ Requires ~/.julia/packages/Zygote/umM0L/src/Zygote.jl:45 ┌ Debug: Requires conditionally ran code in 0.258361828 seconds: `Zygote` detected `LogExpFunctions` └ @ Requires ~/.julia/packages/Zygote/umM0L/src/Zygote.jl:46 ┌ Debug: Requires conditionally ran code in 0.003038184 seconds: `Zygote` detected `Colors` └ @ Requires ~/.julia/packages/Zygote/umM0L/src/Zygote.jl:60 ┌ Debug: Requires conditionally ran code in 0.284711345 seconds: `ImageMorphology` detected `ImageMetadata` └ @ Requires ~/.julia/packages/ImageMorphology/6oqcS/src/ImageMorphology.jl:65 ┌ Debug: Requires conditionally ran code in 0.002487405 seconds: `RegisterCore` detected `ImageMetadata` └ @ Requires ~/.julia/packages/RegisterCore/YuSsR/src/RegisterCore.jl:397 julia> ``` cc. @ChrisRackauckas I think you were also after something like this

You can see that DifferentialEquations.jl had like <1% in Requires.jl time. Some packages may have more, but I think most of the big Requires offenders seem to have been fixed, judging by the fact that DiffEq pulls in a lot of the ecosystem.

This is not recommended. Instead, one should do simple calls in using time, maybe in a let block so the results don’t leak. See https://github.com/SciML/OrdinaryDiffEq.jl/blob/v6.4.2/src/OrdinaryDiffEq.jl#L185-L214 as an example. This way you get something that’s always up-to-date.

Raf · January 21, 2022, 2:52am

That seems to be overblown.

Its 80% of package load time in the example, so maybe not. And using Requires does exactly mean that the code behind the requires block is not precompiled. But thanks for the link.

This is not recommended. Instead.

Its not always possible to actually call functions, but feel free to add a PR that calls Blink.Window() to improve compile time

But I’ll alter the text to include both methods.

ChrisRackauckas · January 21, 2022, 3:06am

Definitely not all of the time, but using precompile should probably the exception rather than the rule. There are some pretty clear maintenance reasons to prefer never using precompile unless you have to.

Sure it can happen, but I dug through a ton of packages and just didn’t find it to be the case for most package load times. In fact, StaticArrays ended up contributing more to using time than Requires for pretty much any package I tried (which is what led to When should a package move to the standard library or system image? StaticArrays, what is it?). So there’s always exceptions, but I don’t think the exceptions should be highlighted as the norm.

Raf · January 21, 2022, 4:16am

I think we are probably optimizing different kinds of packages, and that is reflected in the approach and the problems we encounter.

The places where I have still found classic Julia load times are in things like file IO tools like HDF5.jl and NCDatasets.jl, and some other visualisation things like Blink.Window(), Interact.slider(x). These all still have pretty slow TTFX. Largely they would have to use precompile instead of running functions directly, because of IO. There are other package (e.g. Makie.plot and CSV.File) that are also pretty slow, but for harder to solve reasons.

Maybe it would be more productive to make a list of packages that are still much slower than they need to be, but a lot of people use. Key examples for me are:

Blink.jl
Interact.jl
HDF5.jl

Raf · January 21, 2022, 4:38am

FWIW, precompile statements - especially where there is IO:
https://github.com/timholy/FlameGraphs.jl/blob/master/src/FlameGraphs.jl#L23

ChrisRackauckas · January 21, 2022, 9:26am

GUIs and IO need to do some weird stuff, yes.

jlapeyre · January 21, 2022, 7:32pm

I put some effort into this every now and then, reading about packages for analyzing, and techniques, trying things. I have tried both precompile and running code during precompilation. I have never seen any benefit. In fact, running somefunction(...) in a let block as above made TTFX slower in one case. That is, running somefunction(...) the first time after using the module took longer and allocated more. I know there are ways to investigate to find what to call or precompile, etc. But, it’s fairly complicated.

So in my case, it’s definitely not “written without much thought to startup time”. It’s that reducing start up time takes a lot of learning and a lot of time.

Krastanov · January 21, 2022, 7:49pm

If I could use this to share my own “I tried but the results were bad / made no sense” story: I tried to add let block with a “typical code use” statement (and nothing else, no change to the rest of the library), and that made the runtime (not compiletime) of the library worse: Removing precompilation leads to lower allocations!? Some methods started to allocate instead of being allocation free (again, without changing anything in their source code).

I thought that should not be possible, and I still can not figure out what is so special about this fairly boring simulation code, that led to such weird behavior.

Raf · January 21, 2022, 8:30pm

I’ve also had this experience. In the example here, precompiling Window for Blink.jl does essentially nothing. But I’m pretty sure that’s because its not type stable very far, and the methods it will call don’t actually get compiled.

With NCDatasets.jl I experienced this too, but then fixed the type stability of the objects so all the fields were concrete. That alone improved compile time, and afterwards adding precompile also helped because it compiled much further down. But getting the type stability was actually the most important part.

@jlapeyre yes I could have phrased that better. Its totally true that sometimes you can do a lot of work on compilation and get no benefit from it. So the situation is more that either nothing has been tried, or improving anything is actually quite difficult.

ChrisRackauckas · January 21, 2022, 9:33pm

This could be an instance of https://github.com/JuliaLang/julia/issues/35800 triggered by having Polyester and other JuliaSIMD tools in the stack. We’ve been talking about this inference bug for a bit, it’s a rough one but hopefully it will get resolved soon.

Raf · January 21, 2022, 9:41pm

Its such an awful bug, we wasted a lot of time on it with Accessors.jl, Revise compilation can fix it too giving you hope that you actually fixed something, when you didn’t.

Raf · January 21, 2022, 10:17pm

Also @jlapeyre, with this post I was hoping we could start sharing problems like you describe, and workshop them here like we do with performance issues. We can all dev a package and run @profview using SomePackage without too much hassle. It’s also easy to push a branch with changes to compare and add to.

Krastanov · January 22, 2022, 5:05pm

Exploring the linked issues, led me to a (seemingly esoteric) thing that is being done with kwargs to help with type inference.

Instead of function f(...; kw::Bool=false) people seem to be doing function f(...;kw::Val{T}=Val(false)) where T.

I think I am seeing things like this in these OrdinaryDiffEq and Polyester changes:

https://github.com/SciML/OrdinaryDiffEq.jl/pull/1473/files#diff-8ce813ea8d7f370bc91b5ac1526a80f7fd354be769bafdd6b12b7368f3ae90a9L395

https://github.com/JuliaSIMD/Polyester.jl/commit/5e6ae4c2ae009b507bbcf90ff2c2b9b7d5e94559#diff-a523f7f63af3c48f517501d7e392926093393f7d033fb208f477568ec30bec38L72

Why is this necessary? Why is it better than saying the keyword will be of type Bool?

EDIT: discourse seems to be stripping out the anchors from the links above making it difficult to tell which lines I am talking about. Here are the SciML and Polyester links with anchors.

ChrisRackauckas · January 22, 2022, 5:54pm

That’s to force specialization. Functions and DataTypes do not always specialize when passed into a function.

Raf · January 22, 2022, 9:34pm

Here’s a fun find in the Blink.jl (and Interact.jl) TTFX story: JSON.jl serialization seems to be responsible for half the TTFX of most JuliaGizmos packages! Swapping to JSON3.jl has huge TTFX gains for Interact.jl, and should for WebIO.jl/Blink.jl:

https://github.com/JuliaGizmos/AssetRegistry.jl/pull/15
https://github.com/JuliaGizmos/WebIO.jl/issues/479

TsurHerman · January 23, 2022, 8:56am

There is an inherent problem with Julia design as it is now.
I use it for anything that is a little bit more involved… and the developer experience gets worse and worse with huge start-up time recompilations, and the methods to speed that like concrete typing
style: struct A{T1,T2,T3} makes the error messages I get hideous(Flux anyone?)

I used to circumvent that problem by using my port of package compiler , but recently all incremental builds fails for anything that is more than a text book example.

I pointed out in the past in this forum, that multiple dispatch should work upside-down to what is now… a dispatch table for a function should be determined by the context of the caller. thus making all binary code cacheable.

I would also in my dream language investigate Thorin - AnyDSL which uses Continuation Passing Style under the hood , which I think is the holy grail for language like Julia.
Instead of propagating type instability during inference … resolve the instability in the point of contact and carry on compiling a type stable code.

jakobnissen · January 23, 2022, 2:58pm

I’m not sure I believe there is any way forward on this issue in general, except for the language itself to improve.

Part of the promise of Julia is that you can have a high-level programming language that was also fast, as long as you internalize some idioms. If the social convention become that library authors need to profile the precise inference/compilation of the package and adjust the package accordingly, then at least to me, Julia is no longer particularly convenient in the first place, and its advantage over a static language becomes less clear.

Realistically, it just won’t be possible to create a social norm around this kind of inference whack-a-mole if it is as difficult or annoying as it is now. If there was a simple checklist to improve TTFX similar to the Julia performance tips, then maybe there could be some limited traction

Of course, nothing prevents enthusiastic individual authors to take deep dives into the compilation process and improve the TTPX of their own package, especially if they’ve authored a widely used package with large latency. But this kind of individual effort for specific packages is not the same as a blanket effort affecting the whole ecosystem

By all means, do make PRs on individual packages where the latency annoys you. I just don’t see how we can meaningfully make a collective effort on this issue.

Raf · January 23, 2022, 3:25pm

Isn’t that an unnecessarily depressing take? The whack mole that you are talking about isn’t needed for many packages to be much better than they are now. There are simple improvements available in 100s of packages that have very little to do with the compiler nuances you and @TsurHerman discuss.

That’s what I’m trying to get at here. Half of the improvements are just easy, low hanging fruit, but often also things the compiler will always struggle to optimise.

For example, Interact.jl has 450 stars. Its super useful. And you can get the TTFX down by 80% in half a days work, fixing most of the JuliaGizmos packages at the same time. Probably by 90% with a few more hours.
https://github.com/piever/Widgets.jl/pull/48

This stuff isn’t hard, its just basic profiling and type stability, with ProfileView and Cthulhu. We just have to do it, and waiting for the compiler to fix everything isn’t going to work. It wont.

I never expected to avoid this in package dev. I’ve worked on R/C++ packages and these things are trivial compared to the amount of work you need to put in to make that fast. The promise of julia is that you can do whatever you want in a script. Expecting that in package dev is a big ask.

Making it collective involves improving awareness of known TTFX pitfalls, the need to profile, and normalising that its ok to ask for help on TTFX problems like it is for performance optimisation.

I don’t know why you think we have to slave away solo and not share the experience, as I’m trying to do here, and as @ChrisRackauckas has also been doing really well from his experience.

Topic		Replies	Views
Any way to speed up loading large precompiled packages? General Usage precompilation , ttfp , ttfx	35	3143	May 15, 2023
Startup time of 1000 packages – 53% slower in Julia 1.12 vs 1.10 Internals & Design performance , ttfx , latency	50	3153	May 4, 2025
Ways to make slow/sluggish REPL/interactive development experience faster? Performance repl , ttfp	35	5515	July 23, 2019
Improving loading time of own package General Usage	3	399	August 29, 2022
Using (Web) packages together is much slower, e.g. 0.2 sec + 0.7 sec = 7.8 sec General Usage	4	631	December 6, 2021

Taking TTFX seriously: Can we make common packages faster to load and use

An example: Blink.jl and Window()

Can we do this collectively?

Related topics

An example: Blink.jl and `Window()`