Interestingly, I got very different results with this sample, manual dispatch is not performing better, sometimes (on master) even worse than fully dynamic dispatch. (Not counting the dubious 1 ns result)
| | |_| | | | (_| | | Version 1.5.2 (2020-09-23)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> include("demo.jl")
Manual union split with array of Int 1.118 μs (0 allocations: 0 bytes)
Runtime dispatch with array of Int 1.117 μs (0 allocations: 0 bytes)
Manual union split with array of Float64 1.118 μs (0 allocations: 0 bytes)
Runtime dispatch with array of Float64 1.117 μs (0 allocations: 0 bytes)
Manual union split with mixed array 1.118 μs (0 allocations: 0 bytes)
Runtime dispatch with mixed array 1.117 μs (0 allocations: 0 bytes)
Manual union split with typed mixed array 1.123 ns (0 allocations: 0 bytes)
Runtime dispatch with typed mixed array 232.700 ns (0 allocations: 0 bytes)
julia>
_
| | |_| | | | (_| | | Version 1.6.0-DEV.1063 (2020-09-26)
_/ |\__'_|_|_|\__'_| | Commit 93bbe0833b (1 day old master)
|__/ |
julia> include("demo.jl")
[ Info: Precompiling ManualDispatch [666be00d-f264-4d02-8d64-dd271fd33b9e]
Manual union split with array of Int 673.535 ns (0 allocations: 0 bytes)
Runtime dispatch with array of Int 459.122 ns (0 allocations: 0 bytes)
Manual union split with array of Float64 672.891 ns (0 allocations: 0 bytes)
Runtime dispatch with array of Float64 563.735 ns (0 allocations: 0 bytes)
Manual union split with mixed array 1.117 μs (0 allocations: 0 bytes)
Runtime dispatch with mixed array 895.978 ns (0 allocations: 0 bytes)
Manual union split with typed mixed array 1.123 ns (0 allocations: 0 bytes)
Runtime dispatch with typed mixed array 228.789 ns (0 allocations: 0 bytes)
julia> versioninfo()
Julia Version 1.6.0-DEV.1063
Commit 93bbe0833b (2020-09-26 14:27 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-10.0.1 (ORCJIT, skylake)
It seems that for simple cases dynamic dispatch is the best.
Actually, that was the reason why I used parametric types in my JIT demo. ManualDispatch.jl works like a charm on that:
using BenchmarkTools
using ManualDispatch
const NUM_TYPES = 10
const QUEUE_LENGTH = 100
abstract type A{T} end
struct C1{T} <: A{T} end
struct C2{T} <: A{T} end
const c1_count = Ref(0)
const c2_count = Ref(0)
reset() = (c1_count[] = 0; c2_count[] = 0)
count_subtypes(a::A) = nothing
count_subtypes(c1::C1) = (c1_count[] = c1_count[] + 1; nothing)
count_subtypes(c2::C2) = (c2_count[] = c2_count[] + 1; nothing)
function createq(alpha = 0.7, ql = QUEUE_LENGTH, nt = NUM_TYPES)
@assert ql >= nt
return [rand() < alpha ? C1{Val(i % nt)}() : C2{Val(i % nt)}() for i = 1:ql]
end
function ex_union_split1(x)
for y in x
@unionsplit((C1, C2), count_subtypes(y))
end
end
function ex_union_split2(x)
for y in x
@unionsplit((C1{Val(0)}, C1{Val(1)}, C2{Val(0)}, C2{Val(1)}), count_subtypes(y))
end
end
function ex_runtime_dispatch(x)
for y in x
count_subtypes(y)
end
end
print("Manual 'union split' $(typeof(createq())) with (C1, C2)")
@btime ex_union_split1(x) setup=(reset(); x=createq())
print("Manual 'union split' $(typeof(createq())) with (C1{Val(0)}, C1{Val(1)}, C2{Val(0)}, C2{Val(1)})")
@btime ex_union_split2(x) setup=(reset(); x=createq())
print("Runtime dispatch")
@btime ex_runtime_dispatch(x) setup=(reset(); x=createq())
| | |_| | | | (_| | | Version 1.6.0-DEV.1063 (2020-09-26)
_/ |\__'_|_|_|\__'_| | Commit 93bbe0833b (1 day old master)
|__/ |
julia> include("parametricdemo.jl")
Manual 'union split' Vector{A} with (C1, C2) 87.778 ns (0 allocations: 0 bytes)
Manual 'union split' Vector{A} with (C1{Val(0)}, C1{Val(1)}, C2{Val(0)}, C2{Val(1)}) 5.699 μs (0 allocations: 0 bytes)
Runtime dispatch 7.034 μs (0 allocations: 0 bytes)
The example (C1{Val(0)}, C1{Val(1)}, C2{Val(0)}, C2{Val(1)})
also tries to show that what is happening here is potentially more than a union split.