In the last few weeks we’ve managed to merge several latency-related PRs. Let’s take a moment to reflect on where we are. These are times for display(plot(rand(10))) (first time) on my system:
Ok, not amazing, not instantaneous — but I am still pretty happy with this progress. We can do more. We are actively thinking about multi-threaded codegen, lazier JITing, smarter invalidations, and more micro-optimizations.
I also just quietly initiated a minor “typocalypse” (https://github.com/JuliaLang/julia/pull/36208). This is the long-promised and overly-dramatically-named event in which we intentionally make some inferred types less precise. All it does is reduce the maximum number of method matches type inference recurs into at each call site from 4 to 3. PkgEval and the test suite revealed that the impact of this is fairly minimal, but you still might see some @inferred tests start to fail. I really think we need to do this though, and here is why:
Compiler performance is very sensitive to this parameter. Reducing it to 2 or 1 is much better still for latency, but probably too dramatic a change to our inferred types to do all at once. Increasing it to 5 or 6 can easily cause builds to start to run nearly forever.
This is a really bad parameter to depend on to get the types or performance you need. If you load a package that adds another method, or somebody decides to split one method into two in a package upgrade, poof there goes your type. So it’s much better to find these cases and instead use type declarations or other code rearrangements to get the same effect. We can probably also improve inference precision in other ways (e.g. https://github.com/JuliaLang/julia/pull/36366), and those other ways are usually more efficient.
As usual, file issues if you hit problems and we can try to mitigate them.
Would you or someone else mind clarifying a bit on what is meant by “maximum number of method matches type inference recurs into at each call site”? This isn’t the same as union splitting, right?
Is this the optimization whose limit changed?
julia> struct A end; struct B end;; struct C end; struct D end; struct E end;
julia> f(::A) = 1; f(::B) = 2; f(::C) = 3; f(::D) = 4;
julia> let R = Ref{Union{A,B,C,D,E}}(A())
@btime f($R[])
end
2.250 ns (0 allocations: 0 bytes)
1
julia> f(::E) = 5;
julia> let R = Ref{Union{A,B,C,D,E}}(A())
@btime f($R[])
end
13.666 ns (0 allocations: 0 bytes)
1
Do you have any thoughts on the feasibility of having parameters like this, the union splitting limit, the tuple limit, etc. be locally modifyable? i.e. either in a block with some macro invocation or at the module level? Would the new compiler pass machinery make such an approach more possible?
Thank you and all the other compiler devs for your hard work!
@Mason Yeah, my understanding is also that it’s not union-splitting. This is more about using the method signatures existing in the method table in inference. So, I don’t think Union{A,B,C,D,E} is required for showing the difference in the behavior:
Your work in this area is extremely appreciated!!!
could you report numbers what the “typocalypse” change does with TTFP? So you reported the improvement from 19.8 to 10.9 but we are eager to hear what “typocalypse” brings us
Clarifying question: this should not affect code with concrete types (as inputs to inference) and type stable functions, correct? It is only about giving up on heroic efforts made by the compiler to infer types just becase the current state of the methods table allows it.
When an argument’s inferred type is a Union, that triggers a different path that splits the signature before looking up method matches. That’s limited by a separate parameter union_splitting, which is still 4, so we will still convert up to 4 union cases into branches. If the argument type is Any and there are 4 methods, we will not convert it to branches. Sometimes that’s also referred to as “union splitting” but it’s kind of a misnomer since there is no union.
Yes I think there will eventually be something like that. Keno and others have been working on making the compiler less stateful so it’s easier to run with different settings.
In performance-sensitive code I imagine you’ll want concrete types for everything, and in that case this should have no effect.
Sorry if this is clear, but I have very little knowledge about this. I am concern with some functions of my package SatelliteToolbox.jl. I have, for example, a function called rECEFtoECI that have a lot of definitions like:
It shouldn’t be, if the Val types are always constant. With that many methods, you were probably over the threshold already anyways, so I can’t really imagine this change making a difference here.
Oh! Thanks! I think I understand now. Every time a function like this is called, the parameters are constants. However, I saw that the PR was merged and I can test using the nightly builds.