Union splitting vs C++

Someone asked much farther above in this thread whether a jump-table is possible in Julia. I discovered by poring over machine-code that Julia (an old version possibly pre-1.0 when I was experimenting) indeed will produce a jump-table for the following style of code:

if ltype == 1
   <do the first thing>
elseif ltype == 2
   <do the second thing>
elseif ltype == 3
   <do the third thing>

etc, with the same variable on the LHS of each equality test and the RHS’s being consecutive integers.

Hello! I created an experimental package SingleDispatchArrarys.jl that generates dispatch table for non-homogeneous arrays, which achieves c++ like performance for non-homogeneous arrays and is much more flexible to manage than manual switch-case.

However, c++ like performance is not necessary a good thing. With long if-elseif-elseif chain, LLVM generates jump table for this. On x86 machines, loading a jump table is nearly as fast as branching, but on Raspberry pi 4b, jump table is actually twice as slow as branching(20ms vs 10ms), looks like LLVM has made a bad choice.

7 Likes

As can be seen on Elrod’s LLVM output, the optimizer has figured out that all these branches are really the same thing, so no branch is used at all!
We could disable it by inserting a few “trouble makers”, then the result is as usual:

struct Distraction1 <: LineAbstract length::Float64 end
struct Distraction2 <: LineAbstract length::Float64 end
paint(l :: Distraction1) = sin(l.length)
paint(l :: Distraction2) = exp(l.length)
const LineUnion = Union{Line1,Line2,Line3,Line4,Line5,
                        Line6,Line7,Line8,Line9,Line10,
                        Line11,Line12,Line13,Line14,Line15,
                        Line16,Line17,Line18,Line19,Line20,
                        Distraction1, Distraction2}

Other code are untouched, particularly, line_types in function test only uses Line1 ~ Line20. However, the compiler will not know this, thus code for Distraction1 and Distraction2 must be included in paint3. The results are more reasonable:

julia> Lines.test(1000000)
result = 500137.3913335729
 with runtime dispatch:   210.302 ms (2000000 allocations: 30.52 MiB)
 with splitting:   22.213 ms (0 allocations: 0 bytes)
 with compiler splitting:   511.819 ms (6998980 allocations: 122.05 MiB)
 with runtime dispatch 3:   207.581 ms (2000000 allocations: 30.52 MiB)
 with compiler splitting 3:   16.938 ms (0 allocations: 0 bytes)
500137.3913335729
full code
module Lines

export test
export Line1, Line2, Line3, Line4, Line5, Line6, Line7, Line8, Line9, Line10,
       Line11, Line12, Line13, Line14, Line15, Line16, Line17, Line18, Line19, Line20
export LineUnion, Picture, Picture2

using BenchmarkTools
using Test

abstract type LineAbstract end

struct Line1 <: LineAbstract length::Float64 end
struct Line2 <: LineAbstract length::Float64 end
struct Line3 <: LineAbstract length::Float64 end
struct Line4 <: LineAbstract length::Float64 end
struct Line5 <: LineAbstract length::Float64 end
struct Line6 <: LineAbstract length::Float64 end
struct Line7 <: LineAbstract length::Float64 end
struct Line8 <: LineAbstract length::Float64 end
struct Line9 <: LineAbstract length::Float64 end
struct Line10 <: LineAbstract length::Float64 end
struct Line11 <: LineAbstract length::Float64 end
struct Line12 <: LineAbstract length::Float64 end
struct Line13 <: LineAbstract length::Float64 end
struct Line14 <: LineAbstract length::Float64 end
struct Line15 <: LineAbstract length::Float64 end
struct Line16 <: LineAbstract length::Float64 end
struct Line17 <: LineAbstract length::Float64 end
struct Line18 <: LineAbstract length::Float64 end
struct Line19 <: LineAbstract length::Float64 end
struct Line20 <: LineAbstract length::Float64 end
struct Distraction1 <: LineAbstract length::Float64 end
struct Distraction2 <: LineAbstract length::Float64 end

paint(l :: Line1) = l.length
paint(l :: Line2) = l.length
paint(l :: Line3) = l.length
paint(l :: Line4) = l.length
paint(l :: Line5) = l.length
paint(l :: Line6) = l.length
paint(l :: Line7) = l.length
paint(l :: Line8) = l.length
paint(l :: Line9) = l.length
paint(l :: Line10) = l.length
paint(l :: Line11) = l.length
paint(l :: Line12) = l.length
paint(l :: Line13) = l.length
paint(l :: Line14) = l.length
paint(l :: Line15) = l.length
paint(l :: Line16) = l.length
paint(l :: Line17) = l.length
paint(l :: Line18) = l.length
paint(l :: Line19) = l.length
paint(l :: Line20) = l.length
paint(l :: Distraction1) = sin(l.length)
paint(l :: Distraction2) = exp(l.length)

struct Picture{T<:LineAbstract}
       lines::Vector{T}
end 

const LineUnion = Union{Line1,Line2,Line3,Line4,Line5,
                        Line6,Line7,Line8,Line9,Line10,
                        Line11,Line12,Line13,Line14,Line15,
                        Line16,Line17,Line18,Line19,Line20,
                        Distraction1, Distraction2}
struct Picture2{T<:LineUnion}
       lines::Vector{T}
end 

# Dynamical dispatch at runtime

function paint1(p)
  s = 0.
  for l in p.lines
    s += paint(l)
  end
  s
end

function paint3(p)
  s = 0.
  @inbounds for i in eachindex(p.lines)
    s += paint(p.lines[i])
  end
  s
end

# Union splitting

function paint2(p)
  s = 0.
  for l in p.lines
    if l isa Line1 s += paint(l)
    elseif l isa Line2 s += paint(l)
    elseif l isa Line3 s += paint(l)
    elseif l isa Line4 s += paint(l)
    elseif l isa Line5 s += paint(l)
    elseif l isa Line6 s += paint(l)
    elseif l isa Line7 s += paint(l)
    elseif l isa Line8 s += paint(l)
    elseif l isa Line9 s += paint(l)
    elseif l isa Line10 s += paint(l)
    elseif l isa Line11 s += paint(l)
    elseif l isa Line12 s += paint(l)
    elseif l isa Line13 s += paint(l)
    elseif l isa Line14 s += paint(l)
    elseif l isa Line15 s += paint(l)
    elseif l isa Line16 s += paint(l)
    elseif l isa Line17 s += paint(l)
    elseif l isa Line18 s += paint(l)
    elseif l isa Line19 s += paint(l)
    elseif l isa Line20 s += paint(l)
    end
  end
  s
end

function test(n)

    line_types = [ Line1, Line2, Line3, Line4, Line5, 
                    Line6, Line7, Line8, Line9, Line10, 
                    Line11, Line12, Line13, Line14, Line15, 
                    Line16, Line17, Line18, Line19, Line20 ]
    p = Picture([line_types[rand(1:20)](rand()) for i in 1:n])
    p2 = Picture2(convert(Vector{LineUnion},p.lines))

    global pic1 = p
    global pic2 = p2
    @test paint1(p) ≈ paint2(p)
    println("result = ", paint1(p))

    print(" with runtime dispatch: "); @btime paint1($p) 
    print(" with splitting: "); @btime paint2($p) 
    print(" with compiler splitting: "); @btime paint1($p2) 
    print(" with runtime dispatch 3: "); @btime paint3($p) 
    print(" with compiler splitting 3: "); @btime paint3($p2) 

end

end

using .Lines
test(1_000_000)
2 Likes