A few thoughts in regards to these calculations.
First, annotated union splitting is redundant, since when we have branched to some type of the variable, the compiler already knows the type and can figure out everything. All differences in times that one can see are spurious since the compiler code is the same. It can be easily checked with @code_warntype
macro (I only leave relevant parts here).
julia> @code_warntype paint2(p)
...
2 ββ %7 = @_3::Tuple{LineAbstract, Int64}::Tuple{LineAbstract, Int64}
β (l = Core.getfield(%7, 1))
β %9 = Core.getfield(%7, 2)::Int64
β %10 = (l isa Main.Line1)::Bool
ββββ goto #4 if not %10
3 ββ %12 = s::Float64
β %13 = Main.paint(l::Line1)::Float64
β %14 = Main.f(%13)::Float64
β (s = %12 + %14)
ββββ goto #12
...
julia> @code_warntype paint2_annotated(p)
...
2 ββ %7 = @_3::Tuple{LineAbstract, Int64}::Tuple{LineAbstract, Int64}
β (l = Core.getfield(%7, 1))
β %9 = Core.getfield(%7, 2)::Int64
β %10 = (l isa Main.Line1)::Bool
ββββ goto #4 if not %10
3 ββ %12 = s::Float64
β %13 = Core.typeassert(l::Line1, Main.Line1)::Line1
β %14 = Main.paint(%13)::Float64
β %15 = Main.f(%14)::Float64
β (s = %12 + %15)
ββββ goto #12
...
As one can see, the only difference is %13 = Core.typeassert(l::Line1, Main.Line1)::Line1
which is removed on later stages.
Secondly, the reason why functors are not working is more subtle. On their own, in this example functors are working as function barriers and should have made calculations type stable. But the problem is that when number of kernel functions is too big, compiler gives up. One can see it, by comparing @code_warntype
in case of 2 and 5 subtypes.
With 2 subtypes
julia> @code_warntype paint3(p)
Variables
#self#::Core.Const(paint3)
p::Picture{LineAbstract}
@_3::Union{Nothing, Tuple{LineAbstract, Int64}}
s::Float64
l::LineAbstract
Body::Float64
1 β (s = 0.0)
β %2 = Base.getproperty(p, :lines)::Vector{LineAbstract}
β (@_3 = Base.iterate(%2))
β %4 = (@_3 === nothing)::Bool
β %5 = Base.not_int(%4)::Bool
βββ goto #4 if not %5
2 β %7 = @_3::Tuple{LineAbstract, Int64}::Tuple{LineAbstract, Int64}
β (l = Core.getfield(%7, 1))
β %9 = Core.getfield(%7, 2)::Int64
β %10 = s::Float64
β %11 = (l)()::Float64
β %12 = Main.f(%11)::Float64
β (s = %10 + %12)
β (@_3 = Base.iterate(%2, %9))
β %15 = (@_3 === nothing)::Bool
β %16 = Base.not_int(%15)::Bool
βββ goto #4 if not %16
3 β goto #2
4 β return s
With 5 subtypes
julia> @code_warntype paint3(p)
Variables
#self#::Core.Const(paint3)
p::Picture{LineAbstract}
@_3::Union{Nothing, Tuple{LineAbstract, Int64}}
s::Float64
l::LineAbstract
Body::Float64
1 β (s = 0.0)
β %2 = Base.getproperty(p, :lines)::Vector{LineAbstract}
β (@_3 = Base.iterate(%2))
β %4 = (@_3 === nothing)::Bool
β %5 = Base.not_int(%4)::Bool
βββ goto #4 if not %5
2 β %7 = @_3::Tuple{LineAbstract, Int64}::Tuple{LineAbstract, Int64}
β (l = Core.getfield(%7, 1))
β %9 = Core.getfield(%7, 2)::Int64
β %10 = s::Float64
β %11 = (l)()::Any
β %12 = Main.f(%11)::Float64
β (s = %10 + %12)
β (@_3 = Base.iterate(%2, %9))
β %15 = (@_3 === nothing)::Bool
β %16 = Base.not_int(%15)::Bool
βββ goto #4 if not %16
3 β goto #2
4 β return s
As one can see, codes are almost identical, except in the case of 5 subtypes, the compiler identifies the return type of functor as Any
. It means that the compiler gives up on determining which function to run and just used dynamic dispatch. I can be wrong but for a small set of subtypes, the compiler uses an approach similar to union splitting (or maybe exactly union splitting), so one can get that there was almost no difference between union splitting and functors approach. It is not used always, since it leads to an exponentially growing code in the general case.
As the last point, Iβll repeat one more time, that this problem is not Julia related. Better data organization can provide more benefits than any fancy language technique. Interesting articles, that deserves to be read is this awesome post in biojulia and data locality post in game programming book. Last one is especially interesting, because it is written in C++ and heavily uses OOP approach, but in the end author get the same conclusions: memory layout matter, because CPU is fast and memory is slow. Nothing good can came out if you are beating your cache to a pulp.
Consider following code
struct Picture2
objects::Vector
end
p2 = let p = p
p2 = [Line1[], Line2[], Line3[], Line4[], Line5[]]
for l in p.lines
if l isa Line1
push!(p2[1], l)
elseif l isa Line2
push!(p2[2], l)
elseif l isa Line3
push!(p2[3], l)
elseif l isa Line4
push!(p2[4], l)
elseif l isa Line5
push!(p2[5], l)
end
end
Picture2(p2)
end
function paint(p::Picture2)
s = 0.
for l in p.objects
s += sum(x -> f(paint(x)), l)
end
s
end
And timings
julia> print(" with runtime dispatch: "); @btime paint($p)
with runtime dispatch: 53.530 ms (1000000 allocations: 15.26 MiB)
julia> print(" with splitting: "); @btime paint2($p)
with splitting: 16.914 ms (0 allocations: 0 bytes)
julia> print(" simple sum of an array of Float64 of same size: "); @btime naivesum($x)
simple sum of an array of Float64 of same size: 7.131 ms (0 allocations: 0 bytes)
julia> print(" with proper data structure: "); @btime paint($p2)
with proper data structure: 7.240 ms (10 allocations: 160 bytes)
Despite some small allocations, we are two times faster than union splitting and actually almost as fast as direct naive calculation.
This is what is most important in this story. When you put irregular data into some structure and then trying to traverse it sequentially you are working against the computer and as a result, slow down your calculations. One should avoid it if possible or accept the consequences.