Elimination of (unnecessary) runtime dispatch allocations

Avoiding runtime dispatch is perhaps the #1 optimization advice in Julia forums.
As I understand it, the reason is that when the Julia compiler does a runtime dispatch it does not at compile-time neccesarily know the return type of the function to be called. And so, it needs to allocate space for the return value on the heap (“box’ed”).

Hence, common advice is to avoid runtime dispatch. However, this is not always possible - probably the most frequent example is when processing a vector of heterogenous elements. E.g. a GUI library drawing a list of screen components, a ray tracer processing a list of light sources, a financial library pricing a list of financial instruments, etc…

The minimal viable example is this:

using BenchmarkTools
using Test
using JET

abstract type AbstractFoo end
struct Foo1 <: AbstractFoo end
struct Foo2 <: AbstractFoo end
struct Foo3 <: AbstractFoo end
struct Foo4 <: AbstractFoo end
struct Foo5 <: AbstractFoo end

f(::Foo1) = rand()
f(::Foo2) = rand()
f(::Foo3) = rand()
f(::Foo4) = rand()
f(::Foo5) = rand()

g(arr) = f(first(arr))
foos = AbstractFoo[Foo1()]

@btime g($foos) # 25.978 ns (1 allocation: 16 bytes)
@inferred g(foos) # return type Float64 does not match inferred return type Any
@report_opt g(foos) # runtime dispatch detected: f(%1::AbstractFoo)::Any

In the above example f() always returns a Float64 and there is a valid f for each possible instance of AbstractFoo; yet, the compiler prepares for any return type leading to an (unneccesary) allocation.

I’m not here to ask for specific compiler optimizations - but if I were - this would be at the top of my list. IMHO Julia can do (almost?) anything C++/Rust can do - but easier and just as fast - except for this! And this is a very common (and very valid) programming paradigm.

This topic has been raised before (see related), yet I hope more attention may be beneficial or provide further information. My current capacity only allows for shedding light on the issue - not solving it…

And please note, I’m not looking for workarounds like enums, sumtypes, ManualDispatch, TypeSortedCollections, etc… Been there, done that with varying levels of success…

Related:

5 Likes

Here’s a more relevant issue:

In short, there’s a tradeoff. These are method table optimizations — they’re dependent upon the state of the method table. And thus, when you add more methods, you end up with method table invalidations. Generally when you have lots of methods, there will probably be more defined.

So, yes, you can increase max_methods and rebuild Julia if you want. But you’ll pay for it in other places — namely, higher compilation times and more invalidations.

2 Likes

Thx Matt - yes, I know the invalidation argument - and I know the Julia community has spent probably hundreds of man-years in a combined effort to reduce invalidations and reduce TTFX - to great success!

And reducing invalidations/TTFX is good - but in this case it results in suboptimal “production” code. I.e. it would favor fast REPLs over PackageCompile’d shipped production applications. At the risk of derailing the discussion, my take is that a huge effort has been put into optimizing Julia for interactive use - and less so for “production” use?

But in regards to your suggestion; instead of increasing max_methods, perhaps one way could be to tell the compiler only to “give up” once the number of potential return types at a call-site reaches a specific threshold (e.g. <=2)?

The answer is that it’s a tradeoff. See the table in the linked issue evaluating different algorithms and the tradeoffs therein. There are obviously different winners and losers in any tradeoff, and there are obviously different workloads that’ll win or lose in any given choice. Note that with max_methods=10, Julia takes so long to build it effectively never builds.

I wasn’t seriously suggesting increasing max_methods in a fork of Julia; it’s just the knob that’s responsible for this behavior. Perhaps max_methods could be higher under --compile=all? :person_shrugging:

Fixing boxing of immutable isbits (mutable args are already boxed) arguments during dynamic dispatch seems feasible. As the PR linked to the GH issue you shared shows though, it’d require major internal plumbing work. I suspect you could count the number of people able to do such work in this community on one hand, but it would be nice to see.

Fixing boxing of return values feels much harder, because Julia lacks function-level return type declarations like C++/Rust/Swift/etc. max_methods can work by brute-force devirtualizing the dispatch tree, but that’s like using a screwdriver to hammer in a nail. IMO it’s less a problem of focusing on “interactivity” vs “production”, but a novel language design problem (nobody has figured out how to get function return types + multiple dispatch + a type system as complex as Julia’s working, AFAIK).

What gives me hope is the ongoing work for compiled binary generation. If that effort explores techniques such as sealed method tables, the experience gained could drive future work on dynamic dispatch stability.

Note that there’s an escape hatch for changing max_methods if the code requires: julia/base/experimental.jl at 96866cb8f5c8f28d96c2b9e4eb1ec4f3a00a705b · JuliaLang/julia · GitHub

11 Likes

:exploding_head:

What does @max_methods(n::Int, fdef::Expr) require to move outside from the Experimental module?

1 Like

This is absolutely brilliant! Using this, the allocation disappears and the @btime drops 75%. This will solve - if not all - then a lot of my worries. Thank you Keno!!

1 Like

Sorry for chiming in late.

In a similar situation, I once thought that it would be great if one could give additional return type information to the compiler. In the setting of the OP, I’m thinking of something like

@declare_return_type f(::AbstractFoo)::Float64

Such a statement would promise that f will always return a Float64 whenever the argument is of type AbstractFoo. The compiler would take this into account for type inference and throw an error if some method for f breaks the promise.

Because of union splitting, even a non-concrete return type like

@declare_return_type f(::AbstractFoo)::Union{Float32,Float64}

would speed up things.

However, I don’t know much about the Julia internals. Would something like that be feasible at all?

Check out this conversation

1 Like

The thread you’ve mentioned seems to discuss the idea from the perspective of enforcing interfaces, not as a means to avoid runtime dispatch. Has the latter been considered anywhere?

Well, if you used the declaration mechanism to declare that the function always returns a certain type, then all implementations would have to return that type.

But also, I think you can already do what you’re talking about with FunctionWrappers.jl, no?

Edit: without enforcing the return type for the function, there is a tradeoff decision that the compiler has to make:

  1. See that all currently defined methods return the same time and specialize the function assuming that type… But risk that compiled code being invalidated if a new method is defined (meaning it will need to be recompiled)
  2. Ignore that information and keep the code generic and slower

I could also do

ff(x::AbstractFoo) = f(x)::Float64

Such an ff is not faster than f, but type-stable. Compared to

const fw = FunctionWrapper{Float64, Tuple{AbstractFoo}}(f)

you don’t lose time when the argument type is known:

julia> @b ff(Foo1())
3.165 ns

julia> @b fw(Foo1())
24.891 ns (1 allocs: 16 bytes)

(I hope I’m understanding FunctionWrapper correctly here.)

Still, It would be nice to be able to do this without an umbrella function like ff.

I had an approach in mind where you cannot opt out of a return type contract. My hope would be that this avoids method invalidations.

1 Like

If I’m reading you correctly, this is exactly what I was proposing in that thread. @declare would create an outer function with type assertions and @implement would create methods of an inner function with different kinds of logic. With function sealing (preventing new methods of a function from being created) this would avoid recompilation.

But @ChrisRackauckas pointed out that autodiff tracing needs to be able to create new dispatches of methods for their symbolic types, so implementing something like this in Base would break core parts of the ecosystem.

Returning to this thought, I suppose it’s possible that if you only @declared for narrower abstract types (AbstractFoo), and had a way to seal a function from new methods dispatching on subtypes of that type, then it would still leave the door open to dispatching other types.

It wouldn’t make the function return fully stable, but it would make map(f, ::Vector{AbstractFoo}) stable!