Your benchmark is still suboptimal. It’s type unstable deal to /
operation (it will produce float number).
Here is the correct benchmark. I compare four cases here:
1.A sequence of jump, encode each opcodes with continuous integer (we don’t use symbol here, since symbol require additional loads to perform equality test).
2. FunctionWrappers, which uses Julia’s cfunction
. It has some overheads to check whether inputs types match function signature.
3. Raw function pointers, which has no other overheads at all.
4. native dynamic dispatch
Some comments on the codes: It consists of three parts, the first part is a fix to a bug in FunctionWrappers.jl
. It seems that in Julia > 1.6, LLVM doesn’t inline the llvmcall to assume
and cause ~10 ns slowdown. The second part is an implementation of case 3. The third part is the actual benchmark code. Case 1 use if-elseif-else to match each opcode manually which case 2-3 directly call op(a,b)
to evaluate the result.
To execute the code, just copy the code benchmark.jl
and include that file.
The result is (tested on an old slow computer):
bundle of jump: 12.935 ns (0 allocations: 0 bytes)
function wrapper: 48.559 ns (0 allocations: 0 bytes)
raw pointer: 17.155 ns (0 allocations: 0 bytes)
dynamic dispatch: 362.976 ns (0 allocations: 0 bytes)
So using a function pointer is only slightly slower than the inlined call. And FunctionWrapper is 2x-3x slower than them, Dynamic dispatch is 20x-30x slower. (none of these results are unexcepted). Though the slowdown is not a great deal if dispatch doesn’t happen a lot.
Also, the benchmark can be made to be more accurate if we directly write assembly codes to bypass LLVM’s optimization (for example, we can manually construct a linear jump table to test whether it’s beneficial to use the jump table) . But currently this benchmark result should be enough.
benchmark.jl
# *** Part 1 ***
# A monkey patch to FunctionWrappers's assume function
# assume must have an alwaysinline attribute, otherwises it doesn't get inlined
using FunctionWrappers
import FunctionWrappers.FunctionWrapper
if VERSION >= v"1.6.0-DEV.663"
@inline function assume(v::Bool)
Base.llvmcall(
("""
declare void @llvm.assume(i1)
define void @fw_assume(i8) alwaysinline
{
%v = trunc i8 %0 to i1
call void @llvm.assume(i1 %v)
ret void
}
""", "fw_assume"), Cvoid, Tuple{Bool}, v)
end
else
@inline function assume(v::Bool)
Base.llvmcall(("declare void @llvm.assume(i1)",
"""
%v = trunc i8 %0 to i1
call void @llvm.assume(i1 %v)
ret void
"""), Cvoid, Tuple{Bool}, v)
end
end
# method redefinition
@inline FunctionWrappers.assume(v::Bool) = Main.assume(v)
# *** Part 2, A simple wrapper around function pointers, to make life easier ***
# ArithBiOp{T} is a binary function with type T x T -> T
struct ArithBiOp{T}
fptr::Ptr{Nothing}
end
# get function pointer by look up code instance
function get_biop_ptr(f,::Type{T}) where T
# triger compilation of the function
m = which(f,(T,T)).specializations[1]
if !isdefined(m,:cache)
precompile(f,(T,T))
end
@assert isdefined(m,:cache)
# get the function pointer
ptr = m.cache.specptr
@assert ptr != C_NULL
return ArithBiOp{T}(ptr)
end
# unsafely call the function by following calling conversion
# this only works if the inputs are trivial enough, so we don't need to worry about GC
@inline function (op::ArithBiOp{T})(i1::T,i2::T) where T
unsafe_call(op,i1,i2)
end
# assume is used to bypass null pointer checking.
@inline @generated function unsafe_call(op::ArithBiOp{T},i1::T,i2::T) where T
:(fptr = op.fptr; assume(fptr != C_NULL);ccall(fptr,$T,($T,$T),i1,i2))
end
# ***Part 3: set up benchmark***
# we don't use divide here, since it's type unstable
encode_dict = Dict{Symbol,Int}([:ADD=>0,:SUBS=>1,:MUL=>2])
# symbol inputs
sym_ops = Symbol[:ADD, :ADD, :MUL, :ADD, :SUBS, :SUBS, :MUL, :MUL]
# encode inputs
int_ops = Int[encode_dict[i] for i in sym_ops]
# function inputs
f_ops = Function[+, +, *, +, -, -, *, *]
# raw function pointers from Julia's generic function
ptr_ops = [get_biop_ptr(i,Int64) for i in f_ops]
# function wrappers
funcwrap_ops = [FunctionWrapper{Int, Tuple{Int, Int}}(op) for op in f_ops]
function condjump(ops,a,b)
s = zero(typeof(a))
for op in ops
if op == 0
s += +(a, b)
elseif op == 1
s += -(a, b)
elseif op == 2
s += *(a, b)
end
end
return s
end
function direct_eval(ops,a,b)
s = zero(typeof(a))
for op in ops
s += op(a,b)
end
return s
end
using BenchmarkTools
a=4
b=7
print("bundle of jump:");@btime condjump($int_ops,$a,$b);
print("function wrapper:");@btime direct_eval($funcwrap_ops,$a,$b);
print("raw pointer:");@btime direct_eval($ptr_ops,$a,$b);
print("dynamic dispatch:");@btime direct_eval($f_ops,$a,$b);