Update on latency and the typocalypse

In the last few weeks we’ve managed to merge several latency-related PRs. Let’s take a moment to reflect on where we are. These are times for display(plot(rand(10))) (first time) on my system:

v1.4: 19.8 seconds
v1.5: 12.4 seconds
master: 10.9 seconds

Ok, not amazing, not instantaneous — but I am still pretty happy with this progress. We can do more. We are actively thinking about multi-threaded codegen, lazier JITing, smarter invalidations, and more micro-optimizations.

I also just quietly initiated a minor “typocalypse” (https://github.com/JuliaLang/julia/pull/36208). This is the long-promised and overly-dramatically-named event in which we intentionally make some inferred types less precise. All it does is reduce the maximum number of method matches type inference recurs into at each call site from 4 to 3. PkgEval and the test suite revealed that the impact of this is fairly minimal, but you still might see some @inferred tests start to fail. I really think we need to do this though, and here is why:

  • Compiler performance is very sensitive to this parameter. Reducing it to 2 or 1 is much better still for latency, but probably too dramatic a change to our inferred types to do all at once. Increasing it to 5 or 6 can easily cause builds to start to run nearly forever.
  • This is a really bad parameter to depend on to get the types or performance you need. If you load a package that adds another method, or somebody decides to split one method into two in a package upgrade, poof there goes your type. So it’s much better to find these cases and instead use type declarations or other code rearrangements to get the same effect. We can probably also improve inference precision in other ways (e.g. https://github.com/JuliaLang/julia/pull/36366), and those other ways are usually more efficient.

As usual, file issues if you hit problems and we can try to mitigate them.

70 Likes

Would you or someone else mind clarifying a bit on what is meant by “maximum number of method matches type inference recurs into at each call site”? This isn’t the same as union splitting, right?

Is this the optimization whose limit changed?

julia> struct A end; struct B end;; struct C end; struct D end; struct E end;

julia> f(::A) = 1; f(::B) = 2; f(::C) = 3; f(::D) = 4;

julia> let R = Ref{Union{A,B,C,D,E}}(A())
           @btime f($R[])
       end
  2.250 ns (0 allocations: 0 bytes)
1

julia> f(::E) = 5;

julia> let R = Ref{Union{A,B,C,D,E}}(A())
           @btime f($R[])
       end
  13.666 ns (0 allocations: 0 bytes)
1
6 Likes

Do you have any thoughts on the feasibility of having parameters like this, the union splitting limit, the tuple limit, etc. be locally modifyable? i.e. either in a block with some macro invocation or at the module level? Would the new compiler pass machinery make such an approach more possible?

Thank you and all the other compiler devs for your hard work!

3 Likes

@Mason Yeah, my understanding is also that it’s not union-splitting. This is more about using the method signatures existing in the method table in inference. So, I don’t think Union{A,B,C,D,E} is required for showing the difference in the behavior:

julia> VERSION
v"1.5.0-beta1.0"

julia> struct A end; struct B end;; struct C end; struct D end; struct E end;

julia> f(::A) = 1; f(::B) = 2; f(::C) = 3; f(::D) = 4;

julia> g(x::Ref) = f(x[]);

julia> @code_warntype g(Ref{Any}())
Variables
  #self#::Core.Compiler.Const(g, false)
  x::Base.RefValue{Any}

Body::Int64
1 ─ %1 = Base.getindex(x)::Any
│   %2 = Main.f(%1)::Int64
└──      return %2

julia> f(::E) = 5;

julia> @code_warntype g(Ref{Any}())
Variables
  #self#::Core.Compiler.Const(g, false)
  x::Base.RefValue{Any}

Body::Any
1 ─ %1 = Base.getindex(x)::Any
│   %2 = Main.f(%1)::Any
└──      return %2

but after #36208

julia> VERSION
v"1.6.0-DEV.280"

julia> struct A end; struct B end;; struct C end; struct D end; struct E end;

julia> f(::A) = 1; f(::B) = 2; f(::C) = 3;  # no f(::D) yet

julia> g(x::Ref) = f(x[]);

julia> @code_warntype g(Ref{Any}())
Variables
  #self#::Core.Compiler.Const(g, false)
  x::Base.RefValue{Any}

Body::Int64
1 ─ %1 = Base.getindex(x)::Any
│   %2 = Main.f(%1)::Int64
└──      return %2

julia> f(::D) = 4;  # now forth definition

julia> @code_warntype g(Ref{Any}())
Variables
  #self#::Core.Compiler.Const(g, false)
  x::Base.RefValue{Any}

Body::Any
1 ─ %1 = Base.getindex(x)::Any
│   %2 = Main.f(%1)::Any
└──      return %2

So, the type instability happens when adding f(::D) rather than f(::E) after the typocalypse.

11 Likes

Does this have implications regarding writing performance sensitive code? Should we try to limit the number of dispatch targets?

1 Like

Your work in this area is extremely appreciated!!!

could you report numbers what the “typocalypse” change does with TTFP? So you reported the improvement from 19.8 to 10.9 but we are eager to hear what “typocalypse” brings us :slight_smile:

2 Likes

Clarifying question: this should not affect code with concrete types (as inputs to inference) and type stable functions, correct? It is only about giving up on heroic efforts made by the compiler to infer types just becase the current state of the methods table allows it.

7 Likes

If I read the message correctly “master” should include the typocalypse, at least the branch was pushed 2h before this post.

1 Like

When an argument’s inferred type is a Union, that triggers a different path that splits the signature before looking up method matches. That’s limited by a separate parameter union_splitting, which is still 4, so we will still convert up to 4 union cases into branches. If the argument type is Any and there are 4 methods, we will not convert it to branches. Sometimes that’s also referred to as “union splitting” but it’s kind of a misnomer since there is no union.

Yes I think there will eventually be something like that. Keno and others have been working on making the compiler less stateful so it’s easier to run with different settings.

In performance-sensitive code I imagine you’ll want concrete types for everything, and in that case this should have no effect.

26 Likes

Sorry if this is clear, but I have very little knowledge about this. I am concern with some functions of my package SatelliteToolbox.jl. I have, for example, a function called rECEFtoECI that have a lot of definitions like:

rECItoECI(T::T_ROT, ::Val{:GCRF}, ::Val{:J2000}, JD_UTC::Number)
rECItoECI(T::T_ROT, ::Val{:GCRF}, ::Val{:MOD}, JD_UTC::Number)
rECItoECI(T::T_ROT, ::Val{:GCRF}, ::Val{:TOD}, JD_UTC::Number)
rECItoECI(T::T_ROT, ::Val{:GCRF}, ::Val{:TEME}, JD_UTC::Number)
rECItoECI(T::T_ROT, ::Val{:J2000}, ::Val{:GCRF}, JD_UTC::Number)
rECItoECI(T::T_ROT, ::Val{:J2000}, ::Val{:MOD}, JD_UTC::Number)
...

Will it be affected by this change?

It shouldn’t be, if the Val types are always constant. With that many methods, you were probably over the threshold already anyways, so I can’t really imagine this change making a difference here.

1 Like

Oh! Thanks! I think I understand now. Every time a function like this is called, the parameters are constants. However, I saw that the PR was merged and I can test using the nightly builds.