I was reading through the (extremely helpful!) “performance tips” part of of the manual. In the “dangers of absuing multiple dispatch…” section there is a link to “run-time benchmarks comparing (1) type dispatch, (2) dictionary lookup, and (3) a “switch” statement.” The discussion at that link is helpful, and I particularly like that the linked part of the specific discussion goes right to a comment that starts “Your best bet is always to benchmark.”
On the other hand, it is probably (?) undesirable that the manual points to (1) a mailing list that isn’t used anymore and (2) a benchmark that involves pre-v1.0 code (e.g., immutable Container1{T} val::T end
). I thought that the simplest possible change to the documentation would be to change the relevant sentence from “… can be found on the mailing list” to something like “… can be found on the discourse” (along with updating the relevant links).
With that in mind and in case it’s helpful: below I directly rewrite Tim Holy’s example with current language syntax, and then show the results.
The context here is going to be working through an array of parametric containers that hold different types, performing some operation on each element and combining the results. The struct
and function
definitions needed for each approach:
The type-dispatch solution
struct Container1{T}
val::T
end
inc(::Int) = 1
inc(::Float64) = 2
inc(::UInt8) = 3
function evaluate_dispatch(vec)
s = 0
for item in vec
s += inc(item.val)
end
s
end
The dictionary solution
struct Container2
code::Symbol
end
function evaluate_dictionary(vec, dict)
s = 0
for item in vec
s += dict[item.code]
end
s
end
The switch solution
function evaluate_switch(vec)
s = 0
for item in vec
if item.code == :Int
s += 1
elseif item.code == :Float64
s += 2
elseif item.code == :UInt8
s += 3
else
error("Unrecognized code")
end
end
s
end
We’ll profile each of these functions using BenchmarkTools
:
Benchmarking code
using BenchmarkTools
function compare_approaches()
vec1 = [Container1(1), Container1(1.0), Container1(0x01)]
vec2 = [Container2(:Int), Container2(:Float64), Container2(:UInt8)]
dict = Dict(:Int=>1, :Float64=>2, :UInt8=>3)
dispatch_benchmark = @benchmark evaluate_dispatch($vec1)
dictionary_benchmark = @benchmark evaluate_dictionary($vec2,$dict)
switch_benchmark = @benchmark evaluate_switch($vec2)
return (
dispatch = dispatch_benchmark,
dictionary = dictionary_benchmark,
switch = switch_benchmark
)
end
The bottom line turns out to be that, for this very simple function, the switch statement is the fastest, the dictionary is intermediate, and the dispatch solution is the slowest. The most important performance difference is that the type-dispatch solution requires an allocation, whereas both the dictionary and switch solutions are allocation-free.
Very roughly (on my laptop, using Julia v1.11.5, all the usual caveats), switch is roughly a factor of two faster than the dictionary, and is roughly an order of magnitude faster than the type-dispatch approach. My results from the compare_approaches()
function above are here:
dispatch benchmark
BenchmarkTools.Trial: 10000 samples with 993 evaluations per sample.
Range (min … max): 35.257 ns … 5.888 μs ┊ GC (min … max): 0.00% … 98.72%
Time (median): 43.763 ns ┊ GC (median): 0.00%
Time (mean ± σ): 52.043 ns ± 65.339 ns ┊ GC (mean ± σ): 1.27% ± 1.34%
▃▆▇████▇▆▄▃▂▂▂▁▂▂▂▁▁ ▁ ▁▁▂▁▁▂▂▃▃▂▂▂ ▃
██████████████████████▇██▇▆▆█▇▇██████████████████▇▇▇▅▅▇▅▅▆▅ █
35.3 ns Histogram: log(frequency) by time 118 ns <
Memory estimate: 16 bytes, allocs estimate: 1.
dictionary benchmark
BenchmarkTools.Trial: 10000 samples with 999 evaluations per sample.
Range (min … max): 9.163 ns … 286.502 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 9.642 ns ┊ GC (median): 0.00%
Time (mean ± σ): 12.478 ns ± 5.994 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
█▃▁▃▅▃▂▂▁▁▁▁▁▁▁ ▁▂ ▂ ▁ ▁ ▁ ▄▄▁ ▁
██████████████████▇█▇█▇█▆█▅██▆█▆▆█▇▆████▇▆▅▅███▇▆▅▄▄▄█▄▄▃▄▄▂ █
9.16 ns Histogram: log(frequency) by time 27.3 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
Switch statement benchmark
BenchmarkTools.Trial: 10000 samples with 1000 evaluations per sample.
Range (min … max): 4.078 ns … 205.657 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 4.356 ns ┊ GC (median): 0.00%
Time (mean ± σ): 5.861 ns ± 4.363 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▇█▆▄▂▁▁ ▂▁▁▂ ▂▂▁▂▁▁▁ ▂ ▃▃ ▅ ▃ ▁ ▁ ▂
████████████████████▇█████▇█▇██▇█▇▆█▅▆▁▆▁▁█▄▅▄▁▄▄▄▄▄▄▄▆▅▇▄▅ █
4.08 ns Histogram: log(frequency) by time 15.9 ns <
Memory estimate: 0 bytes, allocs estimate: 0.