If I could use this to share my own “I tried but the results were bad / made no sense” story: I tried to add let block with a “typical code use” statement (and nothing else, no change to the rest of the library), and that made the runtime (not compiletime) of the library worse: Removing precompilation leads to lower allocations!? Some methods started to allocate instead of being allocation free (again, without changing anything in their source code).
I thought that should not be possible, and I still can not figure out what is so special about this fairly boring simulation code, that led to such weird behavior.
I’ve also had this experience. In the example here, precompiling Window for Blink.jl does essentially nothing. But I’m pretty sure that’s because its not type stable very far, and the methods it will call don’t actually get compiled.
With NCDatasets.jl I experienced this too, but then fixed the type stability of the objects so all the fields were concrete. That alone improved compile time, and afterwards adding precompile also helped because it compiled much further down. But getting the type stability was actually the most important part.
@jlapeyre yes I could have phrased that better. Its totally true that sometimes you can do a lot of work on compilation and get no benefit from it. So the situation is more that either nothing has been tried, or improving anything is actually quite difficult.
This could be an instance of https://github.com/JuliaLang/julia/issues/35800 triggered by having Polyester and other JuliaSIMD tools in the stack. We’ve been talking about this inference bug for a bit, it’s a rough one but hopefully it will get resolved soon.
Its such an awful bug, we wasted a lot of time on it with Accessors.jl, Revise compilation can fix it too giving you hope that you actually fixed something, when you didn’t.
Also @jlapeyre, with this post I was hoping we could start sharing problems like you describe, and workshop them here like we do with performance issues. We can all dev a package and run @profview using SomePackage without too much hassle. It’s also easy to push a branch with changes to compare and add to.
Why is this necessary? Why is it better than saying the keyword will be of type Bool?
EDIT: discourse seems to be stripping out the anchors from the links above making it difficult to tell which lines I am talking about. Here are the SciML and Polyester links with anchors.
Here’s a fun find in the Blink.jl (and Interact.jl) TTFX story: JSON.jl serialization seems to be responsible for half the TTFX of most JuliaGizmos packages! Swapping to JSON3.jl has huge TTFX gains for Interact.jl, and should for WebIO.jl/Blink.jl:
There is an inherent problem with Julia design as it is now.
I use it for anything that is a little bit more involved… and the developer experience gets worse and worse with huge start-up time recompilations, and the methods to speed that like concrete typing
style: struct A{T1,T2,T3} makes the error messages I get hideous(Flux anyone?)
I used to circumvent that problem by using my port of package compiler , but recently all incremental builds fails for anything that is more than a text book example.
I pointed out in the past in this forum, that multiple dispatch should work upside-down to what is now… a dispatch table for a function should be determined by the context of the caller. thus making all binary code cacheable.
I would also in my dream language investigate Thorin - AnyDSL which uses Continuation Passing Style under the hood , which I think is the holy grail for language like Julia.
Instead of propagating type instability during inference … resolve the instability in the point of contact and carry on compiling a type stable code.
I’m not sure I believe there is any way forward on this issue in general, except for the language itself to improve.
Part of the promise of Julia is that you can have a high-level programming language that was also fast, as long as you internalize some idioms. If the social convention become that library authors need to profile the precise inference/compilation of the package and adjust the package accordingly, then at least to me, Julia is no longer particularly convenient in the first place, and its advantage over a static language becomes less clear.
Realistically, it just won’t be possible to create a social norm around this kind of inference whack-a-mole if it is as difficult or annoying as it is now. If there was a simple checklist to improve TTFX similar to the Julia performance tips, then maybe there could be some limited traction
Of course, nothing prevents enthusiastic individual authors to take deep dives into the compilation process and improve the TTPX of their own package, especially if they’ve authored a widely used package with large latency. But this kind of individual effort for specific packages is not the same as a blanket effort affecting the whole ecosystem
By all means, do make PRs on individual packages where the latency annoys you. I just don’t see how we can meaningfully make a collective effort on this issue.
Isn’t that an unnecessarily depressing take? The whack mole that you are talking about isn’t needed for many packages to be much better than they are now. There are simple improvements available in 100s of packages that have very little to do with the compiler nuances you and @TsurHerman discuss.
That’s what I’m trying to get at here. Half of the improvements are just easy, low hanging fruit, but often also things the compiler will always struggle to optimise.
For example, Interact.jl has 450 stars. Its super useful. And you can get the TTFX down by 80% in half a days work, fixing most of the JuliaGizmos packages at the same time. Probably by 90% with a few more hours. https://github.com/piever/Widgets.jl/pull/48
This stuff isn’t hard, its just basic profiling and type stability, with ProfileView and Cthulhu. We just have to do it, and waiting for the compiler to fix everything isn’t going to work. It wont.
I never expected to avoid this in package dev. I’ve worked on R/C++ packages and these things are trivial compared to the amount of work you need to put in to make that fast. The promise of julia is that you can do whatever you want in a script. Expecting that in package dev is a big ask.
Making it collective involves improving awareness of known TTFX pitfalls, the need to profile, and normalising that its ok to ask for help on TTFX problems like it is for performance optimisation.
I don’t know why you think we have to slave away solo and not share the experience, as I’m trying to do here, and as @ChrisRackauckas has also been doing really well from his experience.
Yes, a lot of problems (of course not all) is that people don’t profile (or even time) e.g. package load time or first call latency.
Just as an example, https://github.com/JuliaMath/SpecialFunctions.jl/issues/310. This was supposed to add a “small lightweight core dependency” and it ended up making the package 4-5x slower to load. It wasn’t until someone complained that ForwardDiff.jl was getting slow to load that I did a @profile using ForwardDiff.jl and it was immediately obvious what was going on. If this is how we treat packages that are pretty much transitively depended on by everyone, no wonder things are slow.
On the more pessimistic side of things, Julia nightly is currently getting slower to import things (which I am certain is related to improvements in other parts of julia, and there are probably already awesome volunteers working on reversing this slowdown):
# julia 1.7.1
@time using Makie
7.797519 seconds (13.29 M allocations: 947.065 MiB, 5.52% gc time, 11.28% compilation time)
# julia nightly 1.8.0-DEV.1275 (2022-01-11)
@time using Makie
9.354482 seconds (14.44 M allocations: 994.084 MiB, 4.89% gc time, 21.24% compilation time)
# julia with the UNFINISHED CodeInstance caching work by Tim Holy
# 1.8.0-DEV.1372 (2022-01-22)
# teh/relocatable_cis/7aac48347e (fork: 1 commits, 1 day)
# (this last comparison does not mean much as this is not finished yet)
@time using Makie
9.647164 seconds (14.26 M allocations: 1001.128 MiB, 4.79% gc time, 19.08% compilation time)
But… I honestly don’t know why you would focus on that. Compiler performance varies month to month for various things but has generally gotten a lot better in the last few years.
As you can see from the various posted example, it’s not always the compilers responsibility to fix things for us, there are 5x performance gains sitting around if you look. Maybe not in Makie. But I guess for at least half of the packages you would say are slow to load.
If we act like its only the compilers responsibility to fix everything, packages will inevitably remain slow to load despite the best efforts of the compiler team.
My bad if what I said was perceived as “the compiler is bad” or “this slowdown makes package profiling unworthy of attention” (or maybe the issue is that I derailed the conversation). I eagerly work on making my code easier to infer/compile. However, I do believe the above example of slowdown is just as important to address, as instilling a culture of “inference profiling” among us, casual package developers. And from listening to JuliaCon talks and reading Hacker News comments, I do know that core devs are taking that seriously (I was not trying to insinuate the opposite).
I didn’t think you meant that… I mean of course that needs to be fixed as well and is good to know about.
But after it is fixed there will still be slow packages around that someone still needs to profile. That’s all we really have influence on unless we actually work on the compiler.
Can we hack together some list of best practices, which could become a page in the docs? Turning the initial list into instructions or alternatively a “debug guide”:
How to write packages with good startup times
Write type stable code whose return types can be inferred
Avoid using Requires.jl, because enclosed code cannot be precompiled
If possible, put widely used calls into a let block in the package main source file
Use precompile for functions with side effects, avoid it for functions without.
Avoid unnecessary dependencies
Use ProfileView.@profview using YourPackage to visualize the loading process.
…
i copied the list over here, where you can freely edit it and make corrections, if necessary, as I’m by no means an authority in this topic. Maybe we can turn this into a PR?
Regarding the last point: I just @profviewed one of my packages and the output is really hard to interpret. Are there any specific things to look out for? Currently, most of the time is spent in task.jl, poptask (80%) and loading.jl, _tryrequire_from_serialized (20%) - which tells me nothing, does it?