Avoiding dynamic dispatch in loops with non-concrete vector types

Let’s say I have some parametric struct struct Foo{T} ... end, and I prepare a vector of Foos.

I now have a loop that for a given vector, which doesn’t mutate during the loop, picks a random index in the loop, and executes an algorithm, which varies based on the type T.

I want to avoid dynamic dispatch in this case, since the structure of the vector is known beforehand. I.e. if I pick index n randomly, at the beginning of the loop it is already known which algorithm should run. Just doing this naively will result in non-specified code which is slow.

One (hacky?) way of solving this is with generated functions and a function barrier.

I can create a tuple type from the vector of food and essentially a switch statement, hardcoding the specific algorithm to be used for every index.

# Algo for Foo{Int64}
function fooalgo(f::Foo, ::Type{Int64})

# Algo for Foo{...}
function fooalgo(f::Foo, Type{...})

@generated function generated_loop(foos::Vector{Foo}, somerange, tt::Type{Tuple})
  # using the type tt, params = tt.parameters
  # generate the following code:
  # for i in somerange
  # idx = rand(1:length(params))
  # if idx == 1
  #    fooalgo(foos[1], params[1])
  # elseif ...
  # ...
  # elseif idx == length(params)
  #    fooalgo(foos[end], params[end])
  # end

function loop(foos, somerange)
  tt = Tuple{[typeof(foo).parameters[1] for foo in foos]...}
  generated_loop(foos, somerange, tt)

This is the only way I could think up of. Is this a reasonable way of doing things, or is there a better way? Also, would this in practice be faster? I would think the if else statement would be faster to compute that the dynamic dispatch.

This is a general problem, and the solution depends on what your needs are.

If T is just used for algorithmic dispatch, and the algorithm you dispatch to takes any appreciable length of time, then the cost of runtime dispatch may be negligible.

If each iteration is independent, you might be able to get by taking the non-concrete Vector{Foo}, parceling it out into a set of Vector{Foo{T}}s for various Ts, and then calling the generated_loop for each homogeneous vector. Then the runtime dispatch only happens for the number of element types, rather than for each element, but you do pay an upfront cost for iterating over foos to bin them into separate vectors.

You may be able to get by with parameterizing foos as Vector{Union{Foo{Int}, Foo{Float64}, Foo{...}}} if you have a handful of T that you need to support, in which case union splitting may be able to help reduce runtime dispatch overhead.

As far as package solutions go, I can think of two potentially useful options. First, I recently found TypeSortedCollections.jl, which looks like it might help if you can express the loop as a map (TypeSortedCollections.jl (link)). Second, I wrote ConcreteTupleDicts.jl a while back for a case where I wanted a Dict that had multiple key types and val types, but I wanted type stable retrieval. It may not be directly helpful as it’s for Dicts, but it essentially takes the approach of “split the container into multiple type stable pieces” (ConcreteTupleDicts.jl (link)).

Regarding your tuple typeof solution: each entry in that tuple would be a DataType to the compiler, so I’m not sure that you would be able to specialize on the specific ordering of types from your vector. In other words, I don’t think you’d get any constant propagation out of this.

If your vector is short enough it can be converted to a Tuple, I believe you would be better off just putting every element of your vector into a tuple and passing that tuple in instead. The large issue with tuples of increasing length is the burden is places on the compiler, as the number of types grows with the length of the tuple. If you’re able to construct a tuple of the length of the vector with the typeof for each entry, you should be able to construct a tuple with the elements themselves.