How to avoid runtime-dispatch in loop

If i have a vector of different objects it is inherently an abstract type. How can I still avoid runtime-dispatch when looping over it?

abstract type AbstractFoo end
mutable struct Foo1 <: AbstractFoo
  k::Float64
end
mutable struct Foo2 <: AbstractFoo 
  k::Float64
end

function run!(f::Foo1, k)
  f.k = k * 1
  return f
end
function run!(f::Foo2, k)
  f.k = k * 2
  return f
end

struct Container
  items::Vector{<:AbstractFoo}
end

get_order(cnt::Container) = 1:length(cnt.items)

cnt = Container([Foo1(0), Foo2(0)])

function run!(cnt::Container, n::Int=1e6)
  idx = get_order(cnt)
  for k in 1:n
    for i in idx
      run!(cnt.items[i], k) # runtime-dispatch 
    end
  end
end

run!(cnt)

With @profview I see runtime-dispatch for run. Not sure if It happens in this example. Can I avoid it? Should I?

Well dynamic dispatch happens when the compiler cannot infer which method to call because some type is not known. I think it reasonable to see why this happens here - essentially you constructes the MWE to demonstrate that.
So either you change your approach fundamentally to avoid the abstractly typed container or you perform the dispatch yourself in code. The latter is done for you by WrappedUnions.jl.

2 Likes

I think you can do one of these three things

  • Use Vector{Union{Foo1, Foo2}} instead of Vector{<:AbstractFoo} for Container. This will work well enough if you don’t have many types (less than 5 I think).
  • Wrap your Union of types and create branches which forces type stability (this is done automatically by WrappedUnions.jl as @abraemer said).
  • New option which I’m fond of: use the ECS pattern instead of a container of different types. The question is actually though if your problem is well-suited for this kind of systems. If so, this is the most performant and flexible option for programs with heterogeneous types in my opinion, but it requires some point of view changes. See GitHub - ark-ecs/Ark.jl: Archetype-based Entity Component System (ECS) for Julia. for that.
1 Like

Thx for your answers!

  • If this would be a package a user could not simply add another type if I use the union approach, correct?
  • The ECS approach looks interesting. But sounds like a rewrite. :see_no_evil_monkey:
  • Would it help to just construct a tuple after I have determined the ordering?

Indeed, it’s impossible to alter one Union type. If you could, you’d invalidate a lot of code every time, and at some point (4 iirc) you lose the small Union optimizations Tortar is talking about. It’s really for small and restrainable Unions like Union{<:Number, Nothing}. Otherwise you need to take a chance on the big branches of WrappedUnions.

Maybe. If the average run! call is otherwise quick enough that the runtime dispatch overhead is significant by comparison, then it’s worth trying. If the runtime dispatch is not, then you’d be refactoring your code for little performance gains, possibly unmeasurable within noise. Another thing to consider is that runtime dispatches usually add small allocations by boxing unboxed values. There are at least a couple typos in your example so we can’t confirm if it does: 1e6::Float64 so it errors for n::Int, and c[i] probably should be cnt[i] and is unimplemented.

Yeah, sry, typed it on my phone.

Another option is to parametrise Container:

using BenchmarkTools

struct Container2{T <: AbstractFoo}
    items::Vector{T}
end

# (...): Container2 equivalents for Container1 methods

cnt2 = Container2([Foo1(0), Foo2(0)])  # Container2{AbstractFoo}

# With Base.getindex(c::Container, i) = c.items[i] and corrected run!: n::Int=10^6 and cnt[i]
@btime run!($cnt);  # 44.689 ms (16810 allocations: 856.29 KiB)
@btime run!($cnt2);  # 2.860 ms (0 allocations: 0 bytes)

cnt = Container([Foo1(0), Foo1(0)])
cnt2 = Container2([Foo1(0), Foo1(0)])  # Container2{Foo1}
@btime run!($cnt);  # 48.464 ms (25209 allocations: 1.25 MiB)
@btime run!($cnt2);  # 1.190 ms (0 allocations: 0 bytes)

One can then add another mutable struct Foo3 <: AbstractFoo, which requires no modifications to Container2 (nor to Container)

cnt = Container([Foo1(0), Foo2(0), Foo3(0)])
cnt2 = Container2([Foo1(0), Foo2(0), Foo3(0)])  # Container2{AbstractFoo}
@btime run!($cnt);  # 73.614 ms (26796 allocations: 1.33 MiB)
@btime run!($cnt2);  # 4.155 ms (0 allocations: 0 bytes)

This is not a silver bullet though, as it seem to rely on the same mechanism as union splitting. If you add a Foo4 <: AbstractFoo, performance degrades significantly:

cnt = Container([Foo1(0), Foo2(0), Foo3(0), Foo4(0)])
cnt2 = Container2([Foo1(0), Foo2(0), Foo3(0), Foo4(0)])
@btime run!($cnt);  # 217.629 ms (3997956 allocations: 61.00 MiB)
@btime run!($cnt2); # 124.464 ms (3997956 allocations: 61.00 MiB)

cnt = Container([Foo1(0), Foo2(0)])
cnt2 = Container2([Foo1(0), Foo2(0)])
@btime run!($cnt);  # 104.608 ms (1998978 allocations: 30.50 MiB)
@btime run!($cnt2);  # 55.975 ms (1998978 allocations: 30.50 MiB)

cnt = Container([Foo1(0), Foo1(0)])
cnt2 = Container2([Foo1(0), Foo1(0)])
@btime run!($cnt);  # 122.933 ms (1998978 allocations: 30.50 MiB)
@btime run!($cnt2);  #  1.190 ms (0 allocations: 0 bytes)

Just in case a concrete example of manual, branch based dispatch is helpful:

function run!(cnt::Container, n::Int=1e6)
  idx = get_order(cnt)
  for k in 1:n
    for i in idx
      item = cnt.items[i]
      if item isa Foo1
        # compiler knows item::Foo1 within this branch
        run!(item, k) # static dispatch to run!(::Foo1, ::Int)
      elseif item isa Foo2
        # compiler knows item::Foo2 within this branch
        run!(item, k) # static dispatch to run!(::Foo2, ::Int)
      else
        run!(item, k) # runtime-dispatch fallback
      end
    end
  end
end

There are certainly disadvantages to this approach (redundant code, it doesn’t scale well, isn’t user-extensible, etc), but if you know that certain subtypes are more common or more important (to avoid runtime dispatch), it is effective. (And will help with runtime-dispatch associated allocations, which is why I have used this in the past.)