OK, so I thought itβd be a great idea to demonstrate to students on their very first lesson the performance benefits of Juliaβs Top Notion: multiple dispatch. So I wrote the program shown below. Only problem is, the results (even after calling several times to bypass compilation) seem to indicate that the code with dispatch runs about four times slower than the code that substitutes a conditional for the dispatching. What am I doing wrong?!
Thanks,
Niall.
abstract type Species end
struct Elephant <: Species end
struct Wolf <: Species end
struct Deer <: Species end
struct Bush <: Species end
struct GroundPlant <: Species end
function interactnodispatch( a::Species, b::Species)
if a isa Elephant && b isa Deer ||
a isa Elephant && b isa Deer
"Competition!"
elseif a isa Elephant && b isa Bush ||
a isa Bush && b isa Elephant
"Eating!"
elseif a isa Elephant && b isa GroundPlant ||
a isa GroundPlant && b isa Elephant
"Trample!"
elseif a isa Deer && b isa GroundPlant ||
a isa GroundPlant && b isa Deer
"Eating"
elseif a isa Wolf && b isa Deer ||
a isa Deer && b isa Wolf
"Eating!"
elseif a isa Bush && b isa GroundPlant ||
a isa GroundPlant && b isa Bush
"Competition!"
else
"No interaction!"
end
end
interactdispatch( a::Species, b::Species) = "No interaction!"
interactdispatch( a::Elephant, b::Deer) = "Competition!"
interactdispatch( a::Deer, b::Elephant) = "Competition!"
interactdispatch( a::Elephant, b::Bush) = "Eating!"
interactdispatch( a::Elephant, b::GroundPlant) = "Trample!"
interactdispatch( a::GroundPlant, b::Elephant) = "Trample!"
interactdispatch( a::Deer, b::GroundPlant) = "Eating!"
interactdispatch( a::GroundPlant, b::Deer) = "Eating!"
interactdispatch( a::Wolf, b::Deer) = "Eating!"
interactdispatch( a::Deer, b::Wolf) = "Eating!"
interactdispatch( a::Bush, b::GroundPlant) = "Competition!"
interactdispatch( a::GroundPlant, b::Bush) = "Competition!"
function testnodispatch(species::Vector,n::Int)
testtime = 0.0
for _ in 1:n
testtime += @elapsed interactnodispatch(rand(species),rand(species))
end
testtime
end
function testdispatch(species::Vector,n::Int)
testtime = 0.0
for _ in 1:n
testtime += @elapsed interactdispatch(rand(species),rand(species))
end
testtime
end
function test(n::Int=10000000)
species = [Elephant(),Deer(),Wolf(),Bush(),GroundPlant()]
print("Test without dispatch: ")
println( testnodispatch(species,n))
print("Test with dispatch: ")
println( testdispatch(species,n))
end
I would, btw, expect both functions to have the same performance. The compiler will compile specialized methods for each set of inputs, removing the unused branches, meaning that in the compiled code there will be no branches or conditions.
I think the key to this problem is the fact that youβre calling the function with a randomized type. When the function testdispatch is called, the compiler does not actually know what species type interactdispatch will be called with (since it could be any kind). This means the compiler cannot actually use dispatch at compile time and thus not do any performance optimization.
I think the runtime is maybe even worse because in this case, βspeciesβ is an abstract type, so the compiler perhaps doesβnt even know that it has to be either one of the species in your list, as is done in the explicit βif elseβ consideration (not 100 percent sure about this).
The overall problem can be seen when calling
are marked red, i.e. the compiler does not know about the concrete type of species at compile time.
From the top of my head I cannot think of a good solution that is faithful to the original problem (i.e. random types at runtime). Maybe there is a smart way to experiment further with function barriers.
Edit: ah, seems like @DNF beat me to it
As you see, fast and the same performance. When you randomize the type, so the compiler cannot predict it, you lose that performance, and the dispatch version is even worse:
OK, thanks both of you. On the basis of your comments I had just got it to the stage where the execution times are equal (by specifying types). However that still leaves me with the issue of how to demonstrate performance benefits of multiple dispatch. I really need a case where multiple dispatch comes out on top over intra-code conditionals. Anyone have any ideas?
In cases where the types are known at compile time, there isnβt necessarily any performance benefits. It is rather a question of improved expressivity.
You could probably get the same performance just by writing conditionals, but over time that approach becomes untenable, while multiple dispatch with external methods allows you to easily extend and specialize your code for new types.
One could perhaps say that multiple dispatch makes your code clean and extensible, while type specialization makes it fast. Multiple dispatch makes it easy to reap the performance benefits provided by type specialization.
Well, the tuple idea worked well - the types are now concrete and so I get similar runtimes for with and without dispatch over 1e7 iterations. However Iβm intrigued by the fact that the time with dispatch is now consistently around 1% faster using conditional rather than dispatch. Iβm wondering whether this is @jling 's point - that conditionals are optimised away.
But all of this means my wonderful idea of making first-year students wake up and take a look at multiple dispatch isnβt working at all. For first-time programmers, faster execution is sexy, but code readability (unfortunately) isnβt. I had hoped to hook them on expressibility by coupling it to performance.
# Small assembly program that calculates & prints the Fibonacci series
maxTerms equ 15000 ; number of terms of the series to calculate
digits equ (maxTerms*209+1673)/1000
cntDigits equ 6 ; number of digits for counter
org 100h ; this is a DOS com file
main:
mov ax,'00' ; initialize to all ASCII zeroes
mov di,counter ; including the counter
mov cx,digits+cntDigits/2 ; two bytes at a time
cld ; initialize from low to high memory
rep stosw ; write the data
inc ax ; make sure ASCII zero is in al
mov [num1 + digits - 1],al ; last digit is one
mov [num2 + digits - 1],al ;
mov [counter + cntDigits - 1],al
jmp .bottom ; done with initialization, so begin
.top
; add num1 to num2
mov di,num1+digits-1
mov si,num2+digits-1
mov cx,digits ;
call AddNumbers ; num2 += num1
mov bp,num2 ;
call PrintLine ;
dec dword [term] ; decrement loop counter
jz .done ;
; add num2 to num1
mov di,num2+digits-1
mov si,num1+digits-1
mov cx,digits ;
call AddNumbers ; num1 += num2
.bottom
mov bp,num1 ;
call PrintLine ;
dec dword [term] ; decrement loop counter
jnz .top ;
.done
call CRLF ; finish off with CRLF
mov ax,4c00h ; terminate
int 21h ;
PrintLine:
mov dx,eol ; print combined CRLF and msg1
mov cx,msg1len+eollen ;
call PrintString ;
mov di,counter ; print counter
mov cx,cntDigits ;
call PrintNumericString
call IncrementCount ; also increment the counter
mov dx,msg2 ; print msg2
mov cx,msg2len ;
call PrintString ;
mov di,bp ; recall address of number
mov cx,digits ;
; deliberately fall through to PrintNumericString
PrintNumericString:
; first scan for the first non-zero byte
mov al,'0' ; look for ASCII zero
cld ; scan from MSD to LSD
repe scasb ;
mov dx,di ; points to one byte after
dec dx ; back up one character
inc cx ;
; deliberately fall through to PrintString
PrintString:
mov bx, 1 ; write to stdout
mov ah, 040h ; write to file handle
int 21h ; ignore return value
ret ;
AddNumbers:
std ; go from LSB to MSB
clc ;
pushf ; save carry flag
.top
mov ax,0f0fh ; convert from ASCII BCD to BCD
and al,[si] ; get next digit of number2 in al
and ah,[di] ; get next digit of number1 in ah
popf ; recall carry flag
adc al,ah ; add these digits
aaa ; convert to BCD
pushf ;
add al,'0' ; convert back to ASCII BCD digit
stosb ; save it and increment both counters
dec si ;
loop .top ; keep going until we've got them all
popf ; recall carry flag
ret ;
IncrementCount:
mov cx,cntDigits ;
mov di,counter+cntDigits-1
std ; go from LSB to MSB
stc ; this is our increment
pushf ; save carry flag
.top
mov ax,000fh ; convert from ASCII BCD to BCD
and al,[di] ; get next digit of counter in al
popf ; recall carry flag
adc al,ah ; add these digits
aaa ; convert to BCD
pushf ;
add al,'0' ; convert back to ASCII BCD digit
stosb ; save and increment counter
loop .top ;
popf ; recall carry flag
ret ;
CRLF: mov dx,eol ;
mov cx,eollen ;
jmp PrintString ;
eol db 13,10 ; DOS-style end of line
eollen equ $ - eol
msg1 db 'Fibonacci(' ;
msg1len equ $ - msg1
msg2 db '): ' ;
msg2len equ $ - msg2
counter: ;
num1 equ counter+cntDigits ;
num2 equ num1+digits ;
# Disclaimer: no clue if this program actually runs...
I blocked constant propagation in various different ways, and interactdispatch() runs consistently just slightly slower than interactnodispatch(). The difference isnβt much - a bit under 1% - but it is so consistent that I believe it is real. Interesting, but not helpful for my current issue, which is to sensitise complete novice programmers for multiple dispatch. I think, rather than focusing on performance, I will instead try to create an simple example that illustrates code extensibility.
Thanks very much for all the information - Iβve learned a lot.
Performance was never really the selling point of multiple dispatch if you ask me, I donβt know where you got that from but I have seen a few times some weird performance comparisons. As you have learned from this thread, the compiler is a beast, so itβs not straightforward to understand whatβs happening, especially for newcomers So Iβd rather not start with this feature for newcomers, theyβll be more confused than fascinated.
Now that was a really useful tip - thanks. He distinguishes nicely between reuse of functions across types and reuse of types across functions. I think I can illustrate that simply with code that will be accessible to beginners. Iβll go for it.
Regarding performance benefits of multiple dispatch, I think I just kind of assumed that the dispatcher would manage it more efficiently than my clumsy conditionals, but clearly I was wrong. Ah well - learned something new!
One final point: Iβm still unclear why using tuples makes things quicker. map really does seem to work phenomenally quicker on tuples than on vectors. Why? Have I understood @aplavin correctly that it forces compile-time dispatching? So why is that not the case for vectors?
I think the most important point is the reuse across libraries. I often show new-comers how you can for example easily extend the rand() function with your own type (c.f. Random Numbers Β· The Julia Language).
Iβll try to explain it a little bit simplified: the element types of a tuple are defined upon creation. Although eltype() returns Any for a tuple with mixed types and no other common supertype, it still has full information of the types it contains during its lifetime.
In contrast to an Array{Any}, which you get when you mix types (with Any being the only common supertype). Julia cannot know what you potentially throw into that container later on.
So in case of a tuple, the compiler knows each elements for sure until the object gets garbage collected. In case of an Array{Any}, the types even of existing elements might change any time. If you have an Array{SomeAbstractType}, the situation is similar. Julia will only know that objects in that array will inherit from SomeAbstractType, but the actual leaf-type is only known at runtime.
julia> t = (23, "hello")
(23, "hello")
julia> typeof(t)
Tuple{Int64, String}
julia> v = [23, "foo"]
2-element Vector{Any}:
23
"foo"
julia> typeof(v)
Vector{Any} (alias for Array{Any, 1})
julia> eltype(t) # gives the only type which is a supertype of all elements
Any
julia> eltype(v)
Any
julia> ht = (1, 2, 3) # "homogeneous tuple"
(1, 2, 3)
julia> typeof(ht)
Tuple{Int64, Int64, Int64}
julia> eltype(ht)
Int64
julia> aht = (1.2, 1) # choosing some "Real" numbers
(1.2, 1)
julia> typeof(aht)
Tuple{Float64, Int64}
julia> eltype(aht) # gives "Real" as a supertype for all elements
Real
julia> av = [1, 1.0] # notice that for arrays, Julia might promote your literals automagically ;)
2-element Vector{Float64}:
1.0
1.0