Update on latency and the typocalypse

jeff.bezanson · June 19, 2020, 10:39pm

In the last few weeks we’ve managed to merge several latency-related PRs. Let’s take a moment to reflect on where we are. These are times for display(plot(rand(10))) (first time) on my system:

v1.4: 19.8 seconds
v1.5: 12.4 seconds
master: 10.9 seconds

Ok, not amazing, not instantaneous — but I am still pretty happy with this progress. We can do more. We are actively thinking about multi-threaded codegen, lazier JITing, smarter invalidations, and more micro-optimizations.

I also just quietly initiated a minor “typocalypse” (https://github.com/JuliaLang/julia/pull/36208). This is the long-promised and overly-dramatically-named event in which we intentionally make some inferred types less precise. All it does is reduce the maximum number of method matches type inference recurs into at each call site from 4 to 3. PkgEval and the test suite revealed that the impact of this is fairly minimal, but you still might see some @inferred tests start to fail. I really think we need to do this though, and here is why:

Compiler performance is very sensitive to this parameter. Reducing it to 2 or 1 is much better still for latency, but probably too dramatic a change to our inferred types to do all at once. Increasing it to 5 or 6 can easily cause builds to start to run nearly forever.
This is a really bad parameter to depend on to get the types or performance you need. If you load a package that adds another method, or somebody decides to split one method into two in a package upgrade, poof there goes your type. So it’s much better to find these cases and instead use type declarations or other code rearrangements to get the same effect. We can probably also improve inference precision in other ways (e.g. https://github.com/JuliaLang/julia/pull/36366), and those other ways are usually more efficient.

As usual, file issues if you hit problems and we can try to mitigate them.

Mason · June 19, 2020, 11:01pm

Would you or someone else mind clarifying a bit on what is meant by “maximum number of method matches type inference recurs into at each call site”? This isn’t the same as union splitting, right?

Is this the optimization whose limit changed?

julia> struct A end; struct B end;; struct C end; struct D end; struct E end;

julia> f(::A) = 1; f(::B) = 2; f(::C) = 3; f(::D) = 4;

julia> let R = Ref{Union{A,B,C,D,E}}(A())
           @btime f($R[])
       end
  2.250 ns (0 allocations: 0 bytes)
1

julia> f(::E) = 5;

julia> let R = Ref{Union{A,B,C,D,E}}(A())
           @btime f($R[])
       end
  13.666 ns (0 allocations: 0 bytes)
1

Mason · June 19, 2020, 11:04pm

Do you have any thoughts on the feasibility of having parameters like this, the union splitting limit, the tuple limit, etc. be locally modifyable? i.e. either in a block with some macro invocation or at the module level? Would the new compiler pass machinery make such an approach more possible?

Thank you and all the other compiler devs for your hard work!

tkf · June 20, 2020, 5:57am

@Mason Yeah, my understanding is also that it’s not union-splitting. This is more about using the method signatures existing in the method table in inference. So, I don’t think Union{A,B,C,D,E} is required for showing the difference in the behavior:

julia> VERSION
v"1.5.0-beta1.0"

julia> struct A end; struct B end;; struct C end; struct D end; struct E end;

julia> f(::A) = 1; f(::B) = 2; f(::C) = 3; f(::D) = 4;

julia> g(x::Ref) = f(x[]);

julia> @code_warntype g(Ref{Any}())
Variables
  #self#::Core.Compiler.Const(g, false)
  x::Base.RefValue{Any}

Body::Int64
1 ─ %1 = Base.getindex(x)::Any
│   %2 = Main.f(%1)::Int64
└──      return %2

julia> f(::E) = 5;

julia> @code_warntype g(Ref{Any}())
Variables
  #self#::Core.Compiler.Const(g, false)
  x::Base.RefValue{Any}

Body::Any
1 ─ %1 = Base.getindex(x)::Any
│   %2 = Main.f(%1)::Any
└──      return %2

but after #36208

julia> VERSION
v"1.6.0-DEV.280"

julia> struct A end; struct B end;; struct C end; struct D end; struct E end;

julia> f(::A) = 1; f(::B) = 2; f(::C) = 3;  # no f(::D) yet

julia> g(x::Ref) = f(x[]);

julia> @code_warntype g(Ref{Any}())
Variables
  #self#::Core.Compiler.Const(g, false)
  x::Base.RefValue{Any}

Body::Int64
1 ─ %1 = Base.getindex(x)::Any
│   %2 = Main.f(%1)::Int64
└──      return %2

julia> f(::D) = 4;  # now forth definition

julia> @code_warntype g(Ref{Any}())
Variables
  #self#::Core.Compiler.Const(g, false)
  x::Base.RefValue{Any}

Body::Any
1 ─ %1 = Base.getindex(x)::Any
│   %2 = Main.f(%1)::Any
└──      return %2

So, the type instability happens when adding f(::D) rather than f(::E) after the typocalypse.

SebastianM-C · June 20, 2020, 7:02am

Does this have implications regarding writing performance sensitive code? Should we try to limit the number of dispatch targets?

tobias.knopp · June 20, 2020, 10:28am

Your work in this area is extremely appreciated!!!

could you report numbers what the “typocalypse” change does with TTFP? So you reported the improvement from 19.8 to 10.9 but we are eager to hear what “typocalypse” brings us

Tamas_Papp · June 20, 2020, 11:58am

Clarifying question: this should not affect code with concrete types (as inputs to inference) and type stable functions, correct? It is only about giving up on heroic efforts made by the compiler to infer types just becase the current state of the methods table allows it.

Karajan · June 20, 2020, 2:55pm

If I read the message correctly “master” should include the typocalypse, at least the branch was pushed 2h before this post.

jeff.bezanson · June 20, 2020, 6:37pm

When an argument’s inferred type is a Union, that triggers a different path that splits the signature before looking up method matches. That’s limited by a separate parameter union_splitting, which is still 4, so we will still convert up to 4 union cases into branches. If the argument type is Any and there are 4 methods, we will not convert it to branches. Sometimes that’s also referred to as “union splitting” but it’s kind of a misnomer since there is no union.

Yes I think there will eventually be something like that. Keno and others have been working on making the compiler less stateful so it’s easier to run with different settings.

In performance-sensitive code I imagine you’ll want concrete types for everything, and in that case this should have no effect.

Ronis_BR · June 21, 2020, 12:14am

Sorry if this is clear, but I have very little knowledge about this. I am concern with some functions of my package SatelliteToolbox.jl. I have, for example, a function called rECEFtoECI that have a lot of definitions like:

rECItoECI(T::T_ROT, ::Val{:GCRF}, ::Val{:J2000}, JD_UTC::Number)
rECItoECI(T::T_ROT, ::Val{:GCRF}, ::Val{:MOD}, JD_UTC::Number)
rECItoECI(T::T_ROT, ::Val{:GCRF}, ::Val{:TOD}, JD_UTC::Number)
rECItoECI(T::T_ROT, ::Val{:GCRF}, ::Val{:TEME}, JD_UTC::Number)
rECItoECI(T::T_ROT, ::Val{:J2000}, ::Val{:GCRF}, JD_UTC::Number)
rECItoECI(T::T_ROT, ::Val{:J2000}, ::Val{:MOD}, JD_UTC::Number)
...

Will it be affected by this change?

simeonschaub · June 21, 2020, 3:14pm

It shouldn’t be, if the Val types are always constant. With that many methods, you were probably over the threshold already anyways, so I can’t really imagine this change making a difference here.

Ronis_BR · June 21, 2020, 3:24pm

Oh! Thanks! I think I understand now. Every time a function like this is called, the parameters are constants. However, I saw that the PR was merged and I can test using the nightly builds.

Topic		Replies	Views
Regression in v0.7 due to weird inference failure for function with Lisp-y recursion Internals & Design inference	14	1623	March 23, 2018
When does concrete type inference not matter for performance? Performance type-stability	6	410	September 8, 2023
Multiple dispatch with more than 4 methods blocks type inference Performance	15	1322	October 28, 2020
Profiling compilation/inference Performance inference	10	1405	August 24, 2020
HN comment on Julia vs Swift type checking and auto diff Machine Learning	24	2877	May 14, 2019

Update on latency and the typocalypse

Related topics