https://github.com/JuliaLang/julia/pull/42016
How much is this going to help?
https://github.com/JuliaLang/julia/pull/42016
How much is this going to help?
Yes, a lot of problems (of course not all) is that people donāt profile (or even time) e.g. package load time or first call latency.
Just as an example, ChainRulesCore dependency causes 4x load time regression Ā· Issue #310 Ā· JuliaMath/SpecialFunctions.jl Ā· GitHub. This was supposed to add a āsmall lightweight core dependencyā and it ended up making the package 4-5x slower to load. It wasnāt until someone complained that ForwardDiff.jl was getting slow to load that I did a @profile using ForwardDiff.jl
and it was immediately obvious what was going on. If this is how we treat packages that are pretty much transitively depended on by everyone, no wonder things are slow.
Another one is lazily allocate the threaded buffers and allocate them on the thread that will access it by KristofferC Ā· Pull Request #704 Ā· JuliaWeb/HTTP.jl Ā· GitHub. A single @profile
and some easy rewrites and HTTP.jl is 4x faster to load.
So we can come quite far by people actually caring and measuring and putting in a little bit of work.
On the more pessimistic side of things, Julia nightly is currently getting slower to import things (which I am certain is related to improvements in other parts of julia, and there are probably already awesome volunteers working on reversing this slowdown):
# julia 1.7.1
@time using Makie
7.797519 seconds (13.29 M allocations: 947.065 MiB, 5.52% gc time, 11.28% compilation time)
# julia nightly 1.8.0-DEV.1275 (2022-01-11)
@time using Makie
9.354482 seconds (14.44 M allocations: 994.084 MiB, 4.89% gc time, 21.24% compilation time)
# julia with the UNFINISHED CodeInstance caching work by Tim Holy
# 1.8.0-DEV.1372 (2022-01-22)
# teh/relocatable_cis/7aac48347e (fork: 1 commits, 1 day)
# (this last comparison does not mean much as this is not finished yet)
@time using Makie
9.647164 seconds (14.26 M allocations: 1001.128 MiB, 4.79% gc time, 19.08% compilation time)
On
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: AMD Ryzen 7 1700 Eight-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.0 (ORCJIT, znver1)
Butā¦ I honestly donāt know why you would focus on that. Compiler performance varies month to month for various things but has generally gotten a lot better in the last few years.
As you can see from the various posted example, itās not always the compilers responsibility to fix things for us, there are 5x performance gains sitting around if you look. Maybe not in Makie. But I guess for at least half of the packages you would say are slow to load.
If we act like its only the compilers responsibility to fix everything, packages will inevitably remain slow to load despite the best efforts of the compiler team.
No, this one is a problem. Hopefully we can fix it before 1.8.
My bad if what I said was perceived as āthe compiler is badā or āthis slowdown makes package profiling unworthy of attentionā (or maybe the issue is that I derailed the conversation). I eagerly work on making my code easier to infer/compile. However, I do believe the above example of slowdown is just as important to address, as instilling a culture of āinference profilingā among us, casual package developers. And from listening to JuliaCon talks and reading Hacker News comments, I do know that core devs are taking that seriously (I was not trying to insinuate the opposite).
I didnāt think you meant thatā¦ I mean of course that needs to be fixed as well and is good to know about.
But after it is fixed there will still be slow packages around that someone still needs to profile. Thatās all we really have influence on unless we actually work on the compiler.
Getting good MWEās of compile time regressions are really useful. Whenever you find one, make an issue. We canāt fix regressions we donāt know about.
Can we hack together some list of best practices, which could become a page in the docs? Turning the initial list into instructions or alternatively a ādebug guideā:
precompile
for functions with side effects, avoid it for functions without.ProfileView.@profview using YourPackage
to visualize the loading process.i copied the list over here, where you can freely edit it and make corrections, if necessary, as Iām by no means an authority in this topic. Maybe we can turn this into a PR?
Regarding the last point: I just @profview
ed one of my packages and the output is really hard to interpret. Are there any specific things to look out for? Currently, most of the time is spent in task.jl, poptask
(80%) and loading.jl, _tryrequire_from_serialized
(20%) - which tells me nothing, does it?
Use precompile
when the function has side effects, like opening a window. Maybe we should also design packages so that most functions dont have side effects, like in many functional languages. To make precompilayion easier.
It tells you somethingā¦ it sounds like the time is in some async tasks? That does make things a little harder. But we would really need to see the flame graph to know. Whatās the package?
The package is here, but itās in explorative state and the code quality is so bad that Iām a bit ashamed to share (type instability etc). The flame graph looks like this:
The long flat task.jl on the right looks suspicious, but I donāt know where it comes from.
the right part comes from the (unused) threads. Should go away with a single thread.
Yes, having a simple checklist for package authors would be a good idea. What to think about when writing the code, and what to do as a ācompile time optimization passā after writing the code.
Maybe I was too pessimistic. My impression after having tried to use SnoopCompile on a few of my own packages is that reducing compilation time requires you to know all kinds of implementation details like backedges, and how precompilation actually works. And that you, beside having to know all this stuff, then need to carefully go through and analyze first invalidations, then time and analyze inference, and lastly go through your package to find all APIs and exercise them in precompile statements.
But maybe that was a wrong impression, and itās actually possible to learn a series of quick checks/tweaks that can be done in say, 30 minutes for a small package. Such a checklist would be worth gold.
Maybe also copy some steps in from this checklist in SnoopCompile: SnoopCompile.jl Ā· SnoopCompile
How does one check the effect of precompilation on TTFX? Just time using
with and without?
9 posts were split to a new topic: Why isnāt size
always inferred to be an Integer?
@PetrKryslUCSD im using TTFX like time-to-first-plot. The time to load a package + the time to do its main thing X that it does, using @time thefunction(x)
.
My PR above made blinks load in 1s, but Window()
takes 15 so its still not a huge gain.
Yeah SnoopCompile is more complicated. I mostly just use ProfileView.jl and Cthulhu.jl and give up after that. Anything really bad is pretty obvious in ProfileView. But I would also like to know how to use SnoopCompile better.
So, inspired by this thread I decided to ProfileView the TTFX for my package, DFTK, which currently stands at a ridiculous 90s. About 1/5 of that time is in inference, without reference to package code. Some places I can recognize are functions that have nested functions, but that is a pattern I use a lot and Iām not sure how to do without (and not sure if thatās even the problem). A lot of the advice I see for reducing TTFP are about method redefinitions, but I donāt think I redefine many base functions. Iād appreciate any advice.
I wander if the compiler could read assertions, though, and use that information.
Question more on the subject: Can someone provide a clear step by step of what they are doing to produce these flamegraphs and benchmarking the TTFX? It is not clear to me how to do that, given that the first execution of the profile has to be discarded, and on the second run well, it is not anymore the first run (at least the one from VSCode these are the instructions).
So let it also solve this issue