Type stability and heterogeneous collections

I am working on a library where I cannot avoid heterogeneous collections, and I want type-stable code.
I’m trying to improve my understanding of how the compiler behaves in those cases (and in general).

There are already different discussions here related to this topic:

But they do not fully address my doubts.

My question is associated with the following MWE, divided into two parts.

# First part
abstract type BaseType{T} end

getvalue(::BaseType, t) = error("Not implemented") 
Base.eltype(::Type{<:BaseType{T}}) where T = T
veltype(::AbstractVector{<:BaseType{T}}) where T = T

struct A{T} <: BaseType{T}
    x::T
end
A(x::T) where T = A{T}(x) 
getvalue(s::A, t) = s.x + cos(t)

struct B{T} <: BaseType{T}
    x::T
end
B(x::T) where T = B{T}(x) 
getvalue(s::B, t) = s.x + sin(t)

function loop(v, t) 
    val = zero(veltype(v))
    for vi in v 
        val += getvalue(vi, t)
    end
    return val
end

N = 10
v = [i % 2 == 0 ? A(1.0) : B(1.0) for i in 1:N];
t = 1.0

loop(v, t);
@btime loop($v, $t);

# Second part
struct C{T, N} <: BaseType{T}
    x::T
end
C(x::T, n) where T = C{T, n}(x) 
getvalue(s::C{T, N}, t) where {T, N} = s.x + t

loop(v, 0.0);
@btime loop($v, $t);

I believe the first part is a fine implementation that exploits function barriers, although I’m not sure if can be improved further. My understanding is that within the loop function the getvalue return type is statically dispatched, which is also what I got from @code_warntype (e.g. in this case Body::Float64).

Now adding the second part messes up the type stability completely, the Body now is Any and the whole program allocates, although C is not used at all here, and veltype(v) returns the correct type (e.g. Float64).

Can someone help with this? :slight_smile:

It’s the vector v. It has type BaseType{Float64}. That is an abstract type, and I believe there is some special casing when there are very few subtypes. So the compiler can figure out that even if vi in the loop is only known to be the abstract BaseType{Float64}, it can look at the two possible concrete types and figure out that getvalue(vi, t) will be a Float64. I believe it gives up on this if there are too many subtypes. (Though I thought “too many” was larger than 3. Five or something). If you use v = Union{A{Float64}, B{Float64}}[...], it is still able to infer the type correctly.

I had a similar problem some years ago, and solved it by letting BaseType{T} be concrete, but with a field type::Symbol which I used for dispatch where needed. There could be better ways, I’m not sure.

1 Like

Actually, it even infers if v = Union{B{Float64}, A{Float64}, C{Float64}}[...].

Thank you @sgaure! Indeed the previous case happens to be a lucky case as just inserting more concrete types does present the same issue:

abstract type BaseType{T} end

getvalue(::BaseType, t) = error("Not implemented") 
Base.eltype(::Type{<:BaseType{T}}) where T = T
veltype(::AbstractVector{<:BaseType{T}}) where T = T

for s in [:A, :B, :C, :D, :E, :F, :G]
    @eval begin 
        struct $s{T} <: BaseType{T}
            x::T
        end
        $s(x::T) where T = $s{T}(x) 
        getvalue(s::$s, t) where T = s.x + t
    end
end

function loop(v, t) 
    val = zero(veltype(v))
    for vi in v 
        val += getvalue(vi, t)
    end
    return val
end

N = 10
v = [i % 2 == 0 ? A(1.0) : B(1.0) for i in 1:N];
t = 1.0

loop(v, t);
@btime loop($v, $t);

The approach you proposed is the one I already have used in FramesTransformation.jl. That was an acceptable solution for that case, but here the idea is to have concrete types that share an interface only, e.g. could be radically different inside.

Some time ago I also wrote down a relatively simple package to avoid that (MixedTypesContainers.jl), that is similar to Unrolled.jl but I was looking for alternative solutions and trying to understand how inference works here.

I created this package some time ago: MixedStructTypes.jl. It should allow you to keep type-stability when operating on multiple types, the only downside is that you need to use branching instead of dispatch for operating differently based on the type if you want to keep max performance. Also, if your types are radically different, with many fields and instances, I would use the @sum_structs in the package, it is a little slower but much better on the memory side, it is a sum type under the hood. I tried to make it possible to operate with the types defined by the package as similar as possible to normal structs, so It should be easy to work with them as well. Let me know if this helps you :slight_smile:

1 Like

I think the dispatch of getvalue(vi, t) looks at methods(getvalue, (eltype(v), typeof(t))), or rather the instantiations of the methods. The compiler creates fast dispatch code if there are three or less choices in the list, otherwise it dispatches at runtime. So, if you ditch the getvalue(::BaseType, t) your first example will actually work. But it quickly grows too large.

Compared to runtime dispatch of virtual methods in C++, julia is slow. I think I read a discussion somewhere. It’s not as easy as in C++ with a quick lookup in a vtable, because of multiple dispatch and dynamic things. Though, I’m uncertain whether it could be speeded up; possibly at a compile time cost with the invalidations when new subtypes are introduced.

1 Like