How to access shared "inherited" fields without paying multiple dispatch performance price?

Hi, I’m evaluating Julia to replace some old C++ code. The code uses inheritance where there is a single base class and lots (100’s) of derived classes. There are a large number of instances, and the code needs to be able to quickly get a base class field from a derived instance.

I know that Julia does not have field inheritance and uses multiple dispatch to access data on abstract types instead. However, I’ve noticed that multiple dispatch methods slow down quite a bit for my use case when many implementations of the method are added. I’ve created some basic code to illustrate my point:

abstract type Car end

struct HondaAccord <: Car
    length::Int64
end

struct ToyotaCamry1 <: Car
    length::Int64
end

struct ToyotaCamry2 <: Car
    length::Int64
end

function getLength(a::T)::Int64 where T <: Car
    a.length
end

function speedTest()
    cars = Car[]
    while length(cars) < 10000
        push!(cars, HondaAccord(100))
        push!(cars, ToyotaCamry1(100))
        push!(cars, ToyotaCamry2(100))
    end

    sum = 0::Int64
    for j = 1:10000
        for i = 1:length(cars)
            sum = sum + getLength(cars[i])
        end
    end

    println(sum)
end

As you add additional Toyota Camrys (ToyotaCamry3, 4, 5, etc), it starts slowing down more and more. I’m assuming this is because of the way multiple dispatch is implemented.

My question is this: if you know that the layout of every derived class starts the same way (i.e., has the same the “inherited” fields), is there some way to access those fields without going through multiple dispatch (even if it is unsafe)? Thanks!

Julia is quite good at doing what you ask and does it very fast. If there is any incremental slowdown, that would not be a result of the multiple dispatch is implemented. Within Julia, multiple dispatch – the use of different signatures with the same named method or same function – for the purpose of routing the computation to a more specific, better performing or more accurate algorithmic implementation, is fully resolved (with respect to the signature-selected implementation a method will use) before any of the code that you have written is run.

Multiple dispatch is resolved by transforming a general method (e.g. length) to the appropriate uniquely signature-determined direct function call anywhere that you use a multiply-dispatchable method. And, given unambiguous method signatures (so no two different method implementations will be selected simultaneously to handle one signature), this process is assured to resolve uniquely.

Your example is not doing any benchmark-like timing. To benchmark performance, it is recommended that you use BenchmarkTools.jl and escape any variables that are arguments of the function to be timed (by prefixing each one with a $, see below). I will provide an example that better covers your question with my next response.

Thanks for the reply. I saw a major performance difference using @time in Juno (I ran it a couple times until the GC allocations were stable).

As shown:

julia> @time speedTest()
10002000000
  2.528261 seconds (25 allocations: 256.984 KiB)

With 4 more implementations added (ToyotaCamry3,4,5,6 and push!() them into the array):

julia> @time speedTest()
10003000000
  7.402068 seconds (25 allocations: 256.984 KiB)

Note the array length stays about the same (it might vary by 1 or 2 elements between the two versions, it stops adding at ~10000 elements), but this is a 3x performance gap. I thought this kind of thing was just a standard downside to multimethods – when the call can’t be resolved to something direct (and it cannot in this case, I think), you have to look through each possible option.

This isn’t really a multiple dispatch issue per se but more a matter of avoiding containers with abstract type parameters.

There are a lot of different ways to solve this problem, but choosing the correct one depends a lot on what you actually want to accomplish. Presumably this example is representative of something you actually want to do, right? Are you interested in an operation where you have a very small number of types and you know which types they are ahead of time? Or do you have a large number of types that you don’t know ahead of time? And how much work do you actually need to do per element of the vector?

4 Likes

Why not just recast your approach to allow each variation of the method you call to be unambiguous.
You are correct about the desire to write Julia code that is well-posed with respect to calling different realizations of one method through dispatch signatures. Usually, there is a reframing of your "method"ology that lets things go apace. One way that your example suggests is to use a parameterized struct for Automobile, used with each realization of the abstract type Car:

using Dates

abstract type Car end
struct Automobile{Make, Model}
    model_year::Year
    zero_to_sixty::Seconds
end

ToyotaCamry1 = Automobile{:Toyota, :Camry}(Year(2008), Second(15))
ToyotaCamry2 = Automobile{:Toyota, :Camry}(Year(2012), Second(12))

Amusingly, you came up with the precise example used in a different manual section cautioning legacy OOP users against what seem like natural uses of the type system. That was written for an older version of Julia and could use a bit of updating, but the fundamental fact is that Julia is fast when it can predict types OR when the number of types is small enough that it will use union splitting. When you invoke “run time method lookup” things slow down a lot.

In your case I would encode the make & model with values (e.g., :honda etc), not a type. But I recognize that sometimes the problem with toy examples is they don’t fully convey the complexity of what you’re trying to achieve, so don’t hesitate to clarify further.

5 Likes

Oh wow, that’s a crazy coincidence! I guess we all think of cars when giving toy OOP examples, eh?

The actual problem I’m working on is a knowledge graph for something similar to an expert system. Each node in the graph has metadata associated with it, and it often doesn’t know the exact type of its children/parents are (these are stored as pointers to a base class). It often uses the metadata that’s stored in each object to search for objects, then I can use multimethods on the filtered set. (They can also be stored in heterogeneous lists, hence the example I gave.)

I know that a common solution to something like this would be to pull the shared data out of the class and store it next to the abstract reference, an array of “value” structs. However, the metadata can sometimes change, so this wouldn’t work, since there could be multiple references to the same object.

I think Julia is a great fit for a lot of things I’d like to do, I love the macro system and multimethods and runtime JIT with LLVM, but I do need to be able to somehow access these common fields without all that indirection, which would really hurt the performance. Is it somehow possible? Is there a dark “unsafe” side to Julia like Rust? :grinning:

There are still a number of options open to you. Let’s try simple first: can you “invert” your container organization, and use the base class as the outer class and store untyped extra info? Like this:

struct Car
    # generic information common to every Car
    year::Year
    cost::Int
    has_airbags::Bool
    # specific information for each type
    specificdata
end

Then you only pay dispatch price when you need to call something on specificdata.

If that’s not acceptable, there are other things you can do, like maintaining your own method table (but it starts getting messy), something like this:

const foo_method = IdDict(HondaAccord=>foo_honda, ToyotaCamry=>foo_camry, ...)

function bar(cars)
    for car in cars
        foo_method[typeof(car)](car)
    end
end

but I am not sure how much of an improvement (if any) that would be. (The intent is to leverage the fact that you don’t need runtime dispatch if there is only one method for a function.)

3 Likes

That first option looks pretty close. It’s an extra allocation per item, which I’d like to avoid since there are a huge number of them… But it would allow fast iteration on the common fields. I think that’s enough to get me started. Thank you!

Have you looked at LightGraphs.jl and it’s add-on packages? I think it provides functionality close to what you describe.

1 Like

Looks interesting, thanks!

Ok, I just couldn’t leave this one alone. It is possible to read the the “shared” fields with no overhead, but it’s super unsafe. I used unsafe_load along with a modified version of pointer_from_objref that doesn’t check if classes are mutable or not (that check is really slow). I also made all the structs mutable, because they obviously need to be heap allocated for my use case (still learning! :grin:).

The performance is much better:

julia> @time speedTest()
10003000000
  0.082848 seconds (10.03 k allocations: 413.281 KiB)

That’s ~30x faster than the fastest method dispatch. Obviously this is totally a microbenchmark and does not represent real-world usage. Still, it does show the overhead was there, and it’s gone now. The higher count of allocations is from heap allocating the structs.

I benchmarked a C++ translation of this code, at it runs at the same speed as Julia. I’m honestly very impressed with Julia here – what I’m doing is totally against the design of the language, but Julia gives me the tools to do it anyway and compiles it down into a super-optimized result. And then I can write a macro to wrap it all up into something that looks nice. Awesome.

The code is below. This is just a quick proof-of-concept and very unsafe. getLength() is pulling from pointer offset 1 in the struct, you could get the offsets of the other fields with fieldoffset(). Also, for struct references (pointers) you’d need to convert them back to objrefs.

abstract type Car end

mutable struct HondaAccord <: Car
    length::Int64
end

mutable struct ToyotaCamry1 <: Car
    length::Int64
end

mutable struct ToyotaCamry2 <: Car
    length::Int64
end

function unsafe_pointer_from_objref(@nospecialize(x))
    ccall(:jl_value_ptr, Ptr{Cvoid}, (Any,), x)
end

function getLength(a::Car)::Int64
    unsafe_load(Core.Intrinsics.bitcast(Ptr{Int64}, unsafe_pointer_from_objref(a)), 1)
end

function speedTest()
    cars = Car[]
    while length(cars) < 10000
        push!(cars, HondaAccord(100))
        push!(cars, ToyotaCamry1(100))
        push!(cars, ToyotaCamry2(100))
    end

    sum = 0::Int64
    for j = 1:10000
        for i = 1:length(cars)
            sum = sum + getLength(cars[i])
        end
    end

    println(sum)
end
5 Likes

Glad you recognize the challenges, and kudos for finding a way forward. And a belated welcome to the community!

3 Likes

You can also try storing a FunctionWrapper which essentially lets you create a virtual method table and should perform similarly to a virtual method in C++. But I should also say that this is a tool that felt very natural when I was coming from the OO world and has been less and less tempting the more I learn about the other kinds of approaches possible in Julia.

Also, the reason I was asking about how much work you want to do per item is that the cost you pay for accessing an item from an abstract container is essentially fixed at some small but significant number (say, 100ns). Your microbenchmark essentially measures only that cost, but if you are doing a few microseconds worth of work per item, then you may not notice the penalty from run-time dispatch at all, provided you use a “function barrier” to ensure you only pay the cost once per item.

3 Likes

The FunctionWrapper library looks very useful, thank you. I didn’t realize there was overhead on standard function pointer calls.

You’re absolutely right about microbenchmarks. Any differences measured are usually dwarfed by the speed of memory access and algorithm choice, and most of the time the relevant functions aren’t called enough to matter. But occasionally you do need to run something a ridiculous amount of times per second, and it’s nice to know that the language/runtime can perform screaming fast.