Consider the following code:
abstract type Op end
struct Add{T} <: Op
foo::T
end
struct Mul{T} <: Op
foo::T
end
op(o::Add, value) = o.foo + value
op(o::Mul, value) = o.foo * value
abstract type AbstractCalculator end
struct Calculator{O} <: AbstractCalculator where {O<:Tuple{Op}}
ops::O
end
c = Calculator((Add(5), Mul(2)))
function run_op(calculator::C, op_index, value) where {C<:Calculator}
o = calculator.ops[op_index]
op(o, value)
end
using BenchmarkTools
@benchmark run_op(c, 1, 3.5)
@code_warntype run_op(c, 1, 3.5)
function run_op_manual(calculator::Calculator, op_index, value)
if op_index == 1
return op(calculator.ops[1], value)
elseif op_index == 2
return op(calculator.ops[2], value)
else
error("boink")
end
end
@benchmark run_op_manual(c, 1, 3.5)
@code_warntype run_op_manual(c, 1, 3.5)
When calling run_op
, the o
variable is naturally type unstable and causes heap allocations, since it can take one of two potential types depending on the value of the op_index
argument and not just its type. This seems reasonable enough to me. Alternatively, a manually “unrolled” version, as in run_op_manual
, is type-stable, less allocating and faster as a consequence. Of course, the manual version only works for this specific Calculator
type - if there is either a different number of ops
or they have a different type, the second version will fail either explicitly or by returning wrong results. Nevertheless, all the information required to build this version seems to be contained on the Calculator
type signature, suggesting that you could build something similar at compilation time for every signature that is called (perhaps through metaprogramming?). Am I right in concluding that? If that’s the case, what would be the best way to do it?