Ask for advise : Multiple dispatch vs branching performance

Maucejo · January 13, 2025, 9:13am

Hi all,

I am developing a package for generating structural vibration data, which I hope to release to the community once it is finished. In the course of development, I am faced with a dilemma between multiple dispatch and branching.

I have implemented several structures to define a problem to be solved, such as struct XXProblem ... end. Currently I have about 20 such structures. I have also defined some structures to implement some solvers (like Newmark’s method). Each of these structures is currently used to multiple dispatch a solve function.

My question is : From a performance perspective, is it better to use multiple dispatch or branching ? To illustrate my question, here are two possible implementations of the solve function:

# Multiple dispatch
function solve(prob::XXProblem, u0, params, alg::SolverA())
...
end

function solve(prob::XXProblem, u0, params, alg::SolverB())
...
end

# Branching
function solve(prob::XXProblem, u0, params, alg::AbstractSolver)
   if alg isa SolverA
      sol = _solveA(prob, u0, params)
   else
     sol = _solveB(prob, u0, params)
   end
end

I know I can implement both and use BenchmarkTools to answer this question. But before I do, I wonder if any of you have any experience with this question.

Another related question: What do you recommend to precompile using PrecompileTools ? All the solve functions ? Only the longest version to precompile ?

Thank you

GunnarFarneback · January 13, 2025, 9:25am

It depends on whether the solver is statically known at the call site. I would expect the branching option to be faster if and only if you would otherwise need to do a dynamic dispatch. Unless solve is a very fast operation I also wouldn’t expect it to matter in practice
.

Maucejo · January 13, 2025, 10:48am

@GunnarFarneback Thank you for your answer. Actually, the solver is known at call site. So if I understand your anwer properly, in this case, both implementation should be equally fast in terms of running and precompilation performances. Am I right ?

GunnarFarneback · January 13, 2025, 10:56am

Yes, for both options the dispatch or branching should be optimized away in the static case. I don’t have enough insight to say anything about precompilation considerations.

foobar_lv2 · January 13, 2025, 11:10am

You should consider how long a solve invocation is supposed to take.

If you call that in a tight loop and the runtime is counted in nanoseconds, then performance considerations matter.

If the solve function itself contains a long loop, like an ODE or PDE solver, and the runtime is counted in 10s of microseconds to days, then – who cares if you waste 10ns on an unneeded dynamic multi-dispatch, and who cares about type stability? The function barrier heals all of that.

Don’t overoptimize cold code, instead ~~waste~~ spend time hand-crafting each assembly instruction in your hot loops

Stephen_Vavasis · January 14, 2025, 1:46am

Another consideration is: the Julia compiler has an optimization to make code like this:

if flag == 0
   dosomething()
elseif flag == 1
    dosomethingelse()
elseif flag == 2
    doathirdthing()
 ...
end

compile into a jump-table instead of a sequence of conditionals.

abraemer · January 14, 2025, 5:22am

Maucejo:

# Multiple dispatch
function solve(prob::XXProblem, u0, params, alg::SolverA())
...
end

function solve(prob::XXProblem, u0, params, alg::SolverB())
...
end

# Branching
function solve(prob::XXProblem, u0, params, alg::AbstractSolver)
   if alg isa SolverA
      sol = _solveA(prob, u0, params)
   else
     sol = _solveB(prob, u0, params)
   end
end

I’d like to point out that from Julia’s perspective both of these are essentially equivalent and your question thus indicates that you didn’t properly understand how Julia’s dispatch works. I’ll try to explain to explain what question you actually wanted to ask.

Using dispatch

abstract type AbstractSolver end

struct SolverA <: AbstractSolver
    # fields
end
function solveA(::SolverA) end

struct SolverB <: AbstractSolver
    # fields
end
function solveB(::SolverB) end

## dispatch function
solve(s::SolverA) = solveA(s)
solve(s::SolverB) = solveB(s)

function this_is_fast()
    solver = SolverA()
    solve(solver)
end

function this_is_also_fast(solver::Solver)
    # other stuff
    solve(solver)
end

function this_is_slow() # conceptually slow, due to technical reasons fast
    solver = rand(Bool) ? SolverA() :  SolverB()
    solve(solver)
end

The special thing with Julia is that everytime you call a function, Julia first determines the method to call (this uses the multiple dispatch) and checks the types of the arguments exactly to compile a specialized version of the function for the exact argument types (this is called method specialization). Please note, that type annotation in the signature of a function definition are thus purely used for dispatch and have no other performance implications since argument types are always specialized upon. (Terms and conditions apply.)

With this in mind look at the examples:

In this_is_fast Julia can deduce exactly which type of solver you are using and thus the dispatch happens entirely at compile time and nothing will be left of it.
In this_is_also_fast the same thing happens because anytime you call the function Julia will call a specialized version of this function. Inside that the exact type solver is also known and thus the dispatch can be compiled away.
this_is_slow on the other hand creates a situation where Julia can’t know the type of solver at compile time and this means that Julia needs to check it at runtime and then find the correct method to call also at runtime (called a runtime dispatch). This is slow and probably what you should/could be worried about. (Technical aside: This example is so simple that Julia will just convert it to branches due to an optimization called union splitting)
If you know reread your code, you will realize that due to method specialization Julia always known the exact type of alg anyways and will thus resolve all the branches at compile time. This however doesn’t help you if you call solve from a source where the type is not statically known

Using branches

There is a way around this however but you need to change the types in question a bit. Types are essentially the information on which the compiler operates when performing optimizations (such as removing the dispatch, when all types are known). When you are worried about having runtime dispatches, then you need to reduce the number of types such that it is always clear which type is present. In the current example, you could e.g. do

struct CombinedSolver
    type::Symbol
    solverA::Union{solverA,nothing}
    solverB::Union{solverB,nothing}
end

function solve_branch(cs::CombinedSolver)
    if cs.type == ::SolverA
        s  = cs.solverA
        isnothing(s) && error("You messed up!")
        solveA(s)
    elseif cs.type == ::SolverB
        s = cs.solverB
        isnothing(s) && error("You messed up!")
        solveB(s)
    else
        error("You messed up (but differently)!")
    end
end

By wrapping all possible solver into a single struct CombinedSolver there is only ever 1 single specialization of solve_branch. Julia should also be smart enough to deduce the type of the variable inside each of the branches. So we succeeded in avoiding every possible runtime dispatch
Note: There are packages doing this kind of stuff mostly automatically like LightSumTypes.jl

Maucejo · January 14, 2025, 6:26am

@abraemer Thank you for this detailed and clear explanation. I think I understood how to use multiple dispatch. But as you said, I had some misconception or misunderstanding of the internals Thanks again for clarifying.

Topic		Replies	Views
Any performance issues with multiple dispatch? New to Julia	5	1381	February 17, 2021
Speed of multiple dispatch Performance	14	1202	November 27, 2023
How to dispatch by value? New to Julia	25	4655	July 28, 2021
Performance benefits of multiple dispatch Performance question , multidispatch	20	2515	December 19, 2021
If mutiple dispatch is done at compile time, wouldn't that make things faster? Performance multidispatch	3	513	May 23, 2022

Ask for advise : Multiple dispatch vs branching performance

Using dispatch

Using branches

Related topics