I am developing a package for generating structural vibration data, which I hope to release to the community once it is finished. In the course of development, I am faced with a dilemma between multiple dispatch and branching.
I have implemented several structures to define a problem to be solved, such as struct XXProblem ... end. Currently I have about 20 such structures. I have also defined some structures to implement some solvers (like Newmark’s method). Each of these structures is currently used to multiple dispatch a solve function.
My question is : From a performance perspective, is it better to use multiple dispatch or branching ? To illustrate my question, here are two possible implementations of the solve function:
# Multiple dispatch
function solve(prob::XXProblem, u0, params, alg::SolverA())
...
end
function solve(prob::XXProblem, u0, params, alg::SolverB())
...
end
# Branching
function solve(prob::XXProblem, u0, params, alg::AbstractSolver)
if alg isa SolverA
sol = _solveA(prob, u0, params)
else
sol = _solveB(prob, u0, params)
end
end
I know I can implement both and use BenchmarkTools to answer this question. But before I do, I wonder if any of you have any experience with this question.
Another related question: What do you recommend to precompile using PrecompileTools ? All the solve functions ? Only the longest version to precompile ?
It depends on whether the solver is statically known at the call site. I would expect the branching option to be faster if and only if you would otherwise need to do a dynamic dispatch. Unless solve is a very fast operation I also wouldn’t expect it to matter in practice
.
@GunnarFarneback Thank you for your answer. Actually, the solver is known at call site. So if I understand your anwer properly, in this case, both implementation should be equally fast in terms of running and precompilation performances. Am I right ?
Yes, for both options the dispatch or branching should be optimized away in the static case. I don’t have enough insight to say anything about precompilation considerations.
You should consider how long a solve invocation is supposed to take.
If you call that in a tight loop and the runtime is counted in nanoseconds, then performance considerations matter.
If the solve function itself contains a long loop, like an ODE or PDE solver, and the runtime is counted in 10s of microseconds to days, then – who cares if you waste 10ns on an unneeded dynamic multi-dispatch, and who cares about type stability? The function barrier heals all of that.
Don’t overoptimize cold code, instead waste spend time hand-crafting each assembly instruction in your hot loops
I’d like to point out that from Julia’s perspective both of these are essentially equivalent and your question thus indicates that you didn’t properly understand how Julia’s dispatch works. I’ll try to explain to explain what question you actually wanted to ask.
Using dispatch
abstract type AbstractSolver end
struct SolverA <: AbstractSolver
# fields
end
function solveA(::SolverA) end
struct SolverB <: AbstractSolver
# fields
end
function solveB(::SolverB) end
## dispatch function
solve(s::SolverA) = solveA(s)
solve(s::SolverB) = solveB(s)
function this_is_fast()
solver = SolverA()
solve(solver)
end
function this_is_also_fast(solver::Solver)
# other stuff
solve(solver)
end
function this_is_slow() # conceptually slow, due to technical reasons fast
solver = rand(Bool) ? SolverA() : SolverB()
solve(solver)
end
The special thing with Julia is that everytime you call a function, Julia first determines the method to call (this uses the multiple dispatch) and checks the types of the arguments exactly to compile a specialized version of the function for the exact argument types (this is called method specialization). Please note, that type annotation in the signature of a function definition are thus purely used for dispatch and have no other performance implications since argument types are always specialized upon. (Terms and conditions apply.)
With this in mind look at the examples:
In this_is_fast Julia can deduce exactly which type of solver you are using and thus the dispatch happens entirely at compile time and nothing will be left of it.
In this_is_also_fast the same thing happens because anytime you call the function Julia will call a specialized version of this function. Inside that the exact type solver is also known and thus the dispatch can be compiled away.
this_is_slow on the other hand creates a situation where Julia can’t know the type of solver at compile time and this means that Julia needs to check it at runtime and then find the correct method to call also at runtime (called a runtime dispatch). This is slow and probably what you should/could be worried about. (Technical aside: This example is so simple that Julia will just convert it to branches due to an optimization called union splitting)
If you know reread your code, you will realize that due to method specialization Julia always known the exact type of alg anyways and will thus resolve all the branches at compile time. This however doesn’t help you if you call solve from a source where the type is not statically known
Using branches
There is a way around this however but you need to change the types in question a bit. Types are essentially the information on which the compiler operates when performing optimizations (such as removing the dispatch, when all types are known). When you are worried about having runtime dispatches, then you need to reduce the number of types such that it is always clear which type is present. In the current example, you could e.g. do
struct CombinedSolver
type::Symbol
solverA::Union{solverA,nothing}
solverB::Union{solverB,nothing}
end
function solve_branch(cs::CombinedSolver)
if cs.type == ::SolverA
s = cs.solverA
isnothing(s) && error("You messed up!")
solveA(s)
elseif cs.type == ::SolverB
s = cs.solverB
isnothing(s) && error("You messed up!")
solveB(s)
else
error("You messed up (but differently)!")
end
end
By wrapping all possible solver into a single struct CombinedSolver there is only ever 1 single specialization of solve_branch. Julia should also be smart enough to deduce the type of the variable inside each of the branches. So we succeeded in avoiding every possible runtime dispatch
Note: There are packages doing this kind of stuff mostly automatically like LightSumTypes.jl
@abraemer Thank you for this detailed and clear explanation. I think I understood how to use multiple dispatch. But as you said, I had some misconception or misunderstanding of the internals Thanks again for clarifying.