Avoiding Dynamic dispatch


I am trying to write fairly complicated optimization tool and am running into a use case where the function being called depends on user inputs and hence resulting the need for dynamic dispatch. I could also just implement an if loop and call statically known functions (static dispatch).

  1. As a general practice which is a better approach ? I am mostly concerned about performance.
  2. Is there any Julia trick to avoid dynamic dispatch in cases like these ?



A MWE for the use case is below:

We have two user defined types:

abstract type pow end

struct sep_pow{T<:AbstractFloat} <: pow{T}
  power1AU :: T
  powMin :: T
  powMax :: T
  powerModelCofs :: Vector{T}
  decayRate :: T

struct nuclear_pow{T<:AbstractFloat} <: pow{T}
  BOLPow :: T
  decayRate :: T

and there are two functions update functions, one for each type:

get_power{T}(x::sep_pow{T}, y::T, z::T) = do something 
get_power{T}(x::nuclear_pow{T}, y::T, z::T) = do something 

Now based on the user input I create an array consisting of user-defined number nuclear_pow and sep_pow types:

stuff happens
Arr = [sep_pow{Float64},sep_pow{Float64},nuclear_pow{Float64},sep_pow{Float64}...]

the length of array is dependent on on the user input and the order (and number) in which the two types are present is also user dependent. Finally, this array is then used at the end of each optimization step to compute the cost

Now the compiler doesn’t know at compile time what is the length of these arrays and the order of the functions so calling the get_power function on the whole array results in a dynamic dispatch . The only way I though of mitigating this was via creating two arrays and tracking the order of the two types via another Integer array and using If loop to call the associated get_power function. This seems like a very brute force and complicated approach. Hence this question to seek out a better approach.



when does the user input you speak of occur? Is it during optimization or just once before starting the procedure.

If it is happening in the beginning you can design your function in such a way that the dynamic dispatch only happens in the beginning once (after the user input) and from that point forward everything is inferable and type-stable.

see https://docs.julialang.org/en/stable/manual/performance-tips/#separate-kernel-functions-aka-function-barriers


I think you misunderstand the notion of an MWE: it should be a piece of self-contained, runnable code. Providing that would help you get better answers.

Missing that, I can only speculate: is get_power not type stable? If the problem is variable length vectors of heterogeneous types, have you considerd using Tuples?


If you only have one function you want to call on each of your objects, you might consider storing a vector of FunctionWrappers (from https://github.com/yuyichao/FunctionWrappers.jl ) which can be concretely-typed and should make calling get_power on each element faster than a heterogeneous array:

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.0-rc2.0 (2017-05-18 02:31 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-apple-darwin13.4.0

julia> using BenchmarkTools

julia> using FunctionWrappers: FunctionWrapper

julia> struct A{T}

julia> struct B{T}

julia> get_power(a::A{T}, y::T) where {T} = a.x + y
get_power (generic function with 1 method)

julia> get_power(b::B{T}, y::T) where {T} = b.x - y
get_power (generic function with 2 methods)

julia> # Create a heterogeneous array of `A`s and `B`s
       mixed_array = [A(1), B(2), A(3), B(4), A(5), B(6), A(7), B(8)];

julia> # Calling get_power on each element involves some dynamic dispatch
       @benchmark sum(x -> get_power(x, 5), $mixed_array)
  memory estimate:  0 bytes
  allocs estimate:  0
  minimum time:     693.636 ns (0.00% GC)
  median time:      717.650 ns (0.00% GC)
  mean time:        762.996 ns (0.00% GC)
  maximum time:     2.673 μs (0.00% GC)
  samples:          10000
  evals/sample:     140

julia> # But if instead, we store a concretely typed function wrapper around
       # a closure, where each closure contains a single A or B instance,
       # then we get a concretely typed array, and summing is much faster
       closure_array = [FunctionWrapper{Int, Tuple{Int}}(y -> get_power(x, y)) for x in mixed_array];

julia> @benchmark sum(f -> f(5), $closure_array)
  memory estimate:  0 bytes
  allocs estimate:  0
  minimum time:     115.853 ns (0.00% GC)
  median time:      118.053 ns (0.00% GC)
  mean time:        127.774 ns (0.00% GC)
  maximum time:     458.416 ns (0.00% GC)
  samples:          10000
  evals/sample:     914


You can use a function barrier to build a tuple instead of an array, and then after a single dynamic dispatch the size and elements are all inferred. Or even better, this looks like a good use case for FunctionWrappers as @rdeits suggests.


Thank you @ChrisRackauckas and @Evizero , I wasn’t aware of the function barrier approach and it seems to work well especially if I have multiple functions for the same object. I am still not sure how Julia does this under the hood ? Also, does this mean every-time user changes the order of the functions in the array, the JIT compiler needs another pass to compile the code ?

Thanks @rdeits, this is great. I have already implemented the closure_array approach on the objects which only have one function and it seems to be working great. Again, I would like to know how Julia does this under-the-hood. Any pointers ? I don’t see much documentation on the FunctionWrappers.jl git site.

thanks a lot guys for helping out, I realized my original MWE was not really MWE. Will keep that in mind.


It directly builds a call to the function pointer, and applies the right type conversions for safety and inference. In Julia, every function is a different type, but by wrapping it like this it gets around that.

Yes. The inner part will have another JIT pass when it specializes.

I discuss function auto-specialization in input types in one of these:

The function wrappers approach avoids specialization on functions, meaning it will also avoid the extra JIT pass. This is nice when functions don’t need to be inlined.


Thanks for the great response @ChrisRackauckas . I will follow-up with some questions after I ingest all of this.


Related to this thread, will FunctionWrappers.jl help me avoid dynamic dispatch in the following situation:

I have a function in a program with signature:

do_cool_stuff(in::IO, out::IO) = println(out, readline(in))

(as a very very toy example)

It takes input, does stuff with input, and writes to output.

However! the types of in and out are of abstract type IO, as until my script has finished parsing command line arguments with ArgParse, it is not known whether the program is going to read from STDIN (Base.TTY), or a file (IOStream), and it is not known whether the program is going to write to STDOUT or some other output IOStream.

I’m expecting not knowing the concrete IO types until the running of the program is going to result in a lot of boxed stuff and dynamic distpatch I’d like to avoid.


Use a function barrier (see the performance tips section in the manual).



I will try that approach, I’ve tried limiting the uncertainty in one example here:

function extract_op_flags(args::Dict{Symbol,Any}, flags::Vararg{Symbol})
    bitflags = 0x00
    @inbounds for i in eachindex(flags)
        flag = Bool(args[flags[i]])
        bitflags |= reinterpret(UInt8, flag) << (i - 1)
    return bitflags

I stop the uncertainty arising from the fact the dict has Any as a value type by wrapping the indexing of the dictionary with an explicit conversion to Bool. Not sure if this or a type assertion like flag::Bool = args[flags[i]] is preferred.