Avoiding Vectors of Abstract Types

If that is possible this is by far the best approach, I think.

I don’t know if you already followed this thread, a lot has been discussed there, and that links to other related topics (one the things tried is that approach, which is the faster of all for the toy examples there). This last thread introduces new perspectives as well:

2 Likes

This really depends on your requirements, and is hard to answer without an MWE that reflects the dimension and complexity of your problem.

Generally, type instability and dynamic dispatch can be perfectly fine (and even beneficial, when it comes to compilation times) if used sparingly. Use function barriers and similar techniques to contain type instability as necessary.

4 Likes

I spent the last three hours just to realize that Vector{AbstractType} is even faster than Vector{ConcreteParameteric{}}. Figured I could share that.

In this case I am wondering: Would it hurt to just use Union of all subtypes ? I mean is there a point where, when Union type gets too long it starts hurting efficiency ?

If not what do you think of employing metaprogramming in order to define such vectors as:

@abstract2union ve = Vector{AbstractType}()

where @abstract2union uses subtypes(AbstractType) in order to recursively find all concrete structs and make a new Union type out of them.

so something like:

@macroexpand @abstract2union ve = Vector{AbstractType}()
ve = Vector{Union{filter(!isabstracttype, recursivesubtypes(AbstractType))...}}

in julian pseudocode.

This is uncommon, do you have a MWE you can share?

Yes sure.
I tested it for different Fruits of type FruitAbstract:

using BenchmarkTools
import Parameters: @with_kw

abstract type FruitAbstract end


struct Fruit{T <: FruitAbstract}
    fr::T
end

@with_kw struct Apple <: FruitAbstract
    name::String = "apple"
    id::Int = 1
end

@with_kw struct Banana <: FruitAbstract
    name::String = "banana"
    id::Int = 2
end

@with_kw struct Peach <: FruitAbstract
    name::String = "peach"
    id::Int = 3
end

const FruitsUnion = Union{Apple, Banana, Peach}
const FruitsStrUnion = Union{Fruit{Apple}, Fruit{Banana}, Fruit{Peach}}

@inline (fru::Fruit)() = getfield(fru, :fr)
@inline Base.getproperty(fru::Fruit, sym::Symbol) = getfield(fru(), sym)
@inline Base.show(io::IO, ::MIME"text/plain", fru::Fruit) = print(io, fru())
#@inline Base.convert(::Type{Fruit{T}}, fru::T) where {T <: FruitAbstract} = Fruit(fru)
#@inline Base.convert(::Type{Fruit}, fru::T) where {T <: FruitAbstract} = Fruit(fru)
#@inline Base.convert(::Type{FruitAbstract}, fru::T) where {T <: FruitAbstract} = Fruit(fru)
#@inline Base.convert(::Type{T}, fru::Fruit{T}) where {T <: FruitAbstract} = fru()


name(fru::Fruit) = name(getfield(fru, :fr))
name(f::FruitAbstract) = f.name

basketAb = Vector{FruitAbstract}()
[push!(basketAb, Apple()) for _ in 1:300_000];
[push!(basketAb, Banana()) for _ in 1:300_000];
[push!(basketAb, Peach()) for _ in 1:300_000];
sum([f.id for f in basketAb])

basketFr = Vector{Fruit}()
[push!(basketFr, Fruit(Apple())) for _ in 1:300_000];
[push!(basketFr, Fruit(Banana())) for _ in 1:300_000];
[push!(basketFr, Fruit(Peach())) for _ in 1:300_000];
sum([f.id for f in basketFr])

basketUn = Vector{FruitsUnion}()
[push!(basketUn, Apple()) for _ in 1:300_000];
[push!(basketUn, Banana()) for _ in 1:300_000];
[push!(basketUn, Peach()) for _ in 1:300_000];
sum([f.id for f in basketUn])

basketFrUn = Vector{FruitsStrUnion}()
[push!(basketFrUn, Fruit(Apple())) for _ in 1:300_000];
[push!(basketFrUn, Fruit(Banana())) for _ in 1:300_000];
[push!(basketFrUn, Fruit(Peach())) for _ in 1:300_000];
sum([f.id for f in basketFrUn])

basketFrAp = Vector{Fruit{Apple}}()
[push!(basketFrAp, Fruit(Apple())) for _ in 1:900_000];
sum([f.id for f in basketFrAp])

basketAp = Vector{Apple}()
[push!(basketAp, Apple()) for _ in 1:900_000];
sum([f.id for f in basketAp])

@btime sum(f.id for f in basketAb)
#  36.923 ms (899490 allocations: 13.73 MiB)
#1800000

@btime sum(f.id for f in basketFr)
#  67.196 ms (1799490 allocations: 41.19 MiB)
#1800000

@btime sum(f.id for f in basketUn)
#  4.263 ms (2 allocations: 32 bytes)
#1800000

@btime sum(f.id for f in basketFrUn)
#  4.054 ms (2 allocations: 32 bytes)
#1800000

@btime sum(f.id for f in basketFrAp)
#  877.861 μs (2 allocations: 32 bytes)
#900000

@btime sum(f.id for f in basketAp)
#  882.688 μs (2 allocations: 32 bytes)
#900000

The fastest results are when the Vector is well defined as Vector{Apple} or Vector{Fruit{Apple}} in the range of 900 μs

Immediate after comes the case when the Vector is defined as a Union Vector{Union{Apple, Banana, Peach}} or Vector{Union{Fruit{Apple}, Fruit{Banana}, Fruit{Peach}} in the range of 4 ms (4.5x slower than before)

Right after comes the case of AbstractType with Vector{FruitAbstract} at 37 ms (9x slower than before)

And the slowest is when I use concrete parametric type without signifying the parameter: Vector{Fruit} with a time of 67 ms (1.9x slower than before)

1 Like

If I remember correctly, the small union optimization that makes Vector{Union{...}} efficient has a limit of typemax(UInt8) types in the union, but performance will degrade due to other compiler factors when going above 3-4 types. Unless this has changed since 2018 which it very well might have.

2 Likes

Oh, I did not understood you meant a ConcreteParametric type without any of its type parameters specified. This is a little more expected (a ConcreteParametric type without its type parameters is not truly concrete), but its seems a large difference between simply abstract types and an UnionAll type. Only someone more familiar with the compiler internals than me would be able to answer why such large difference.

It would indeed be interesting to learn what the performance limitations of built-in union splitting are nowadays and what could be possible future improvements. I just checked this pattern for a union of 20 types, there it seems to work.

It works (same performance as manual dispatch) for:

  • vectors of instances of different types (as discussed in this thread), via Vector{Union{T1,T2}}
  • for vectors of concrete types - one would use
    Vector{Union{Type{T1},Type{T2}}}
  • for vectors of abstract types, same as for concrete ones
  • for vectors of functions, also with multiple methods via Vector{Union{typeof(f1),typeof(f2)}}

I think these patterns reflect important use cases. If the limitations are indeed benign nowadays, this should go into the docs…

EDIT: #37378 removed the union size limit for “single dispatch sites”. AFAIU this is the case here. The manual says: " “Small” is defined by the MAX_UNION_SPLITTING constant, which is currently set to 4." It seems that this is at least partially obsolete. Just wasn’t able to find the place where it is set to 4, during the search for this I found that PR.

Will try a PR to the docs when I find time.

2 Likes

See some benchmarking here

2 Likes

Thanks for your notebook. I run a sweep across 1 to 50 distinct types to see how this affect the latency.
This is the code used, compliant to your notebook:

begin
	bst = Vector{Float64}()
	bdd_sab = Vector{Float64}()
	bdd_any = Vector{Float64}()
	bmd_sab = Vector{Float64}()
	bmd_any = Vector{Float64}()
	bus = Vector{Float64}()
	b_dt = Vector{Float64}()
	bus_dt = Vector{Float64}()
	for k in 1:50
		sab_collection=Vector{Sab}([allS[i % k + 1]() for i=1:N])
		any_collection=Vector{Any}([allS[i % k + 1]() for i=1:N])
		single_collection=[allS[1]() for i=1:N]
		#reference - single type
		push!(bst, mean(@benchmark $sumup_f($single_collection)).time)
		#dynamic dispatch
		push!(bdd_sab, mean(@benchmark $sumup_f($sab_collection)).time)
		push!(bdd_any, mean(@benchmark $sumup_f($any_collection)).time)
		# manual dispatch
		push!(bmd_sab, mean(@benchmark $sumup_f_manual($sab_collection)).time)
		push!(bmd_any, mean(@benchmark $sumup_f_manual($any_collection)).time)
		#union splitting
		UnionS=Union{allS[1:k]...}
		unionS_collection=UnionS[o for o ∈ any_collection]
		push!(bus, mean(@benchmark $sumup_f($unionS_collection)).time)
		## collection of type
		datatype_collection=[allS[i % k + 1] for i=1:N]
		# dynamic dispatch
		push!(b_dt, mean(@benchmark $sumup_f($datatype_collection)).time)
		UnionTS=Union{[Type{s} for s in allS[1:k]]...}
		unionTS_collection=UnionTS[o for o ∈ datatype_collection]
		typeof(unionTS_collection)
		# union splitting
		push!(bus_dt, mean(@benchmark $sumup_f($unionTS_collection)).time)
	end
end

And these are the results:


and a closer look for the non-dynamic dispatch ones:

It appears that datatype doesn’t scale good.
I didn’t try it with the functions design because I am more interested in structs that could contain some information.

Apparently it looks that manual dispatch latency grows up to 20 districts types and then remains steady up until 50 distinct types that was tested.

Union splitting does excellent and very close to the reference single type delay from beginning to end.

Probably it would be nice to generalize with metaprogramming and test up to thousands of distinct types.
Also I am not sure what happens when structs start containig some information the functions act upon, because now the structs used are actually singletons with a single state.

5 Likes

Hi, tried to setup something with metaprogramming here:


using BenchmarkTools
NT=50
N=10_000
const alltypes=Any[]

@info "Create $(NT) types and methods:"
@time for i=1:NT
    tname=Symbol("T"*string(i))
    ex=quote 
        struct $tname end
        f(::$tname,x)=$i*x
        push!(alltypes,$tname)
    end
    eval(ex)
end

@info "Create union of $NT types:"
@time const UnionT=Union{alltypes...}



function sumup_f(collection)
	s=0.0
	for i=1:length(collection)
		s+=f(collection[i],1)
	end
	s
end


@info "Create collection of Any of size $N:"
@time any_collection=[rand(alltypes)() for i=1:N]

@info "Create collection of UnionT of size $N:"
@time union_collection=UnionT[rand(alltypes)() for i=1:N]
    

@info "Sum up collection of Any:"
@btime sumup_f($any_collection)

@info "Sum up collection of UnionT:"
@btime sumup_f($union_collection)

I get as a result:

[ Info: Create 50 types and methods:
  0.040379 seconds (14.62 k allocations: 1004.555 KiB, 8.87% compilation time)
[ Info: Create union of 50 types:
  0.000060 seconds (58 allocations: 1.828 KiB)
[ Info: Create collection of Any of size 10000:
  0.071567 seconds (233.06 k allocations: 15.093 MiB, 96.25% compilation time)
[ Info: Create collection of UnionT of size 10000:
  2.545888 seconds (3.76 M allocations: 224.753 MiB, 1.12% gc time, 98.88% compilation time)
[ Info: Sum up collection of Any:
  1.051 ms (10000 allocations: 156.25 KiB)
[ Info: Sum up collection of UnionT:
  76.302 μs (0 allocations: 0 bytes)

Union splitting works well again, but the bottleneck in this code is “Create collection of UnionT of size 10000:” which possibly may be improved by some trickery.

In our case, indeed, we have large collections of not that many types (or functions) so the pattern seems to work well. However it seems to be not documented, so some confirmation appears to be needed.

2 Likes

Did some archeology:

Julia 1.0.5:
Any:     556.432 μs (10000 allocations: 156.25 KiB)
UnionT:  699.802 μs (10000 allocations: 156.25 KiB

Julia 1.3.1:
Any:    590.025 μs (10000 allocations: 156.25 KiB)
UnionT: 699.267 μs (10000 allocations: 156.25 KiB)

Julia 1.4.2:
Any:    567.394 μs (10000 allocations: 156.25 KiB)
UnionT: 708.185 μs (10000 allocations: 156.25 KiB)

Julia 1.5.2:
Any:    645.280 μs (10000 allocations: 156.25 KiB)
UnionT: 732.489 μs (10000 allocations: 156.25 KiB)

Julia 1.6.0:
Any:     1.145 ms (10000 allocations: 156.25 KiB)
UnionT: 73.300 μs (0 allocations: 0 bytes)

Julia 1.6.5:
Any:      1.066 ms (10000 allocations: 156.25 KiB)
UnionT:  72.818 μs (0 allocations: 0 bytes)

Julia 1.7.0:
Any:      1.124 ms (10000 allocations: 156.25 KiB)
UnionT:   8.710 μs (0 allocations: 0 bytes) (sic!)

So this feature seems to be there since 1.6 - correlating with the fact that #37378 went into 1.6.0-beta1.

Not sure what made the other factor 10 possible for 1.7, but this is a good thing…

4 Likes

And why with any the time instead doubled? Is it some benchmarking issue? Or some trade-off to allow the improvement with the union?

I don’t know. Surely not a benchmarking issue. I posted a link to this discussion in the slack (julia/general), may be we will get an answer there.

https://github.com/JuliaLang/julia/pull/44131

As mention I think the results might be misleading. We actually measure only the case of dispatching and not really acting on any “Abstract” data.
I tried the following simple code and it seems that the bottleneck is still there for the general case:

using BenchmarkTools
using Random
using GLMakie

abstract type MyAbstractType end

struct MyInt1 <: MyAbstractType
    x::Int
end
struct MyInt2 <: MyAbstractType
    x::Int
end
struct MyInt3 <: MyAbstractType
    x::Int
end
struct MyInt4 <: MyAbstractType
    x::Int
end
struct MyInt5 <: MyAbstractType
    x::Int
end
struct MyInt6 <: MyAbstractType
    x::Int
end
struct MyInt7 <: MyAbstractType
    x::Int
end
struct OtherInt4
    x::Int
end

function mysum(mv::Vector{T}) where T
    sum([v.x for v in mv])
end

bab = Vector{Float64}()
bany = Vector{Float64}()
bus = Vector{Float64}()

vsref = fill(MyInt1(1), 1000);
# bvs = mean(@benchmark(mysum(vsref))).time

vsab = Vector{MyAbstractType}(fill(MyInt1(1), 1000));
push!(bab , mean(@benchmark(mysum(vsab))).time)

vsany = Vector{MyAbstractType}(fill(MyInt1(1), 1000));
push!(bany , mean(@benchmark(mysum(vsany))).time)

vsus = Vector{Union{MyInt1}}(fill(MyInt1(1), 1000));
push!(bus , mean(@benchmark(mysum(vsus))).time)

# 2 types
#
vs2ab = Vector{MyAbstractType}(vcat(fill(MyInt1(1), 500),fill(MyInt2(1), 500)));
push!(bab , mean(@benchmark(mysum(vsany))).time)

vs2any = Vector{Any}(vcat(fill(MyInt1(1), 500),fill(MyInt2(1), 500)));
push!(bany , mean(@benchmark(mysum(vs2any))).time)

vs2us = Vector{Union{MyInt1, MyInt2}}(vcat(fill(MyInt1(1), 500),fill(MyInt2(1), 500)));
push!(bus, mean(@benchmark(mysum(vs2us))).time)

# 3 types
#
vs3ab = Vector{MyAbstractType}(vcat(fill(MyInt1(1), 334),fill(MyInt2(1), 333), fill(MyInt3(1), 333)));
push!(bab , mean(@benchmark(mysum(vs3any))).time)

vs3any = Vector{Any}(vcat(fill(MyInt1(1), 334),fill(MyInt2(1), 333), fill(MyInt3(1), 333)));
push!(bany , mean(@benchmark(mysum(vs3any))).time)

vs3us = Vector{Union{MyInt1, MyInt2, MyInt3}}(vcat(fill(MyInt1(1), 334),fill(MyInt2(1), 333), fill(MyInt3(1), 333)));
push!(bus , mean(@benchmark(mysum(vs3us))).time)

# 4 types
#
vs4ab = Vector{MyAbstractType}(vcat(fill(MyInt1(1), 250),fill(MyInt2(1), 250), fill(MyInt3(1), 250), fill(MyInt4(1), 250)));
push!(bab , mean(@benchmark(mysum(vs4any))).time)

vs4any = Vector{Any}(vcat(fill(MyInt1(1), 250),fill(MyInt2(1), 250), fill(MyInt3(1), 250), fill(MyInt4(1), 250)));
push!(bany , mean(@benchmark(mysum(vs4any))).time)

vs4us = Vector{Union{MyInt1, MyInt2, MyInt3, MyInt4}}(vcat(fill(MyInt1(1), 250),fill(MyInt2(1), 250), fill(MyInt3(1), 250), fill(MyInt4(1), 250)));
push!(bus , mean(@benchmark(mysum(vs4us))).time)


# 5 types
#
vs5ab = Vector{MyAbstractType}(vcat(fill(MyInt1(1), 200),fill(MyInt2(1), 200), fill(MyInt3(1), 200), fill(MyInt4(1), 200), fill(MyInt5(1), 200)));
push!(bab , mean(@benchmark(mysum(vs4any))).time)

vs5any = Vector{Any}(vcat(fill(MyInt1(1), 200),fill(MyInt2(1), 200), fill(MyInt3(1), 200), fill(MyInt4(1), 200), fill(MyInt5(1), 200)));
push!(bany , mean(@benchmark(mysum(vs5any))).time)

vs5us = Vector{Union{MyInt1, MyInt2, MyInt3, MyInt4, MyInt5}}(vcat(fill(MyInt1(1), 200),fill(MyInt2(1), 200), fill(MyInt3(1), 200), fill(MyInt5(1), 200), fill(MyInt4(1), 200)));
push!(bus , mean(@benchmark(mysum(vs5us))).time)

# 6 types
#
vs6ab = Vector{MyAbstractType}(vcat(fill(MyInt1(1), 170),fill(MyInt2(1), 166), fill(MyInt3(1), 166), fill(MyInt4(1), 166), fill(MyInt5(1), 166), fill(MyInt6(1), 166)));
push!(bab , mean(@benchmark(mysum(vs6ab))).time)

vs6any = Vector{Any}(vcat(fill(MyInt1(1), 170),fill(MyInt2(1), 166), fill(MyInt3(1), 166), fill(MyInt4(1), 166), fill(MyInt5(1), 166), fill(MyInt6(1), 166)));
push!(bany , mean(@benchmark(mysum(vs6any))).time)

vs6us = Vector{Union{MyInt1, MyInt2, MyInt3, MyInt4, MyInt5, MyInt6}}(vcat(fill(MyInt1(1), 170),fill(MyInt2(1), 166), fill(MyInt3(1), 166), fill(MyInt4(1), 166), fill(MyInt5(1), 166), fill(MyInt6(1), 166)));
push!(bus , mean(@benchmark(mysum(vs6us))).time)
# Plot

# 7 types
#
vs7ab = Vector{MyAbstractType}(vcat(fill(MyInt1(1), 148),fill(MyInt2(1), 142), fill(MyInt3(1), 142), fill(MyInt4(1), 142), fill(MyInt5(1), 142), fill(MyInt6(1), 142), fill(MyInt7(1), 142)));
push!(bab , mean(@benchmark(mysum(vs7ab))).time)

vs7any = Vector{Any}(vcat(fill(MyInt1(1), 148),fill(MyInt2(1), 142), fill(MyInt3(1), 142), fill(MyInt4(1), 142), fill(MyInt5(1), 142), fill(MyInt6(1), 142), fill(MyInt7(1), 142)));
push!(bany , mean(@benchmark(mysum(vs7any))).time)

vs7us = Vector{Union{MyInt1, MyInt2, MyInt3, MyInt4, MyInt5, MyInt6, MyInt7}}(vcat(fill(MyInt1(1), 148),fill(MyInt2(1), 142), fill(MyInt3(1), 142), fill(MyInt4(1), 142), fill(MyInt5(1), 142), fill(MyInt6(1), 142), fill(MyInt7(1), 142)));
push!(bus , mean(@benchmark(mysum(vs7us))).time)

colors=[:blue,:green,:yellow]
f,a,p = barplot(repeat(1:length(bab), outer=3), 
        vcat(bab, bany,bus), 
        dodge= repeat(1:3, inner=length(bab)), 
        color= repeat(1:3, inner=length(bab)),
        axis = (title="Summation of struct fields",
                xticks = (1:length(bab), repr.(1:length(bab))),
                ylabel="nanosecs",
                xlabel="types contained in vector"))
axislegend(a, [PolyElement(polycolor = colors[i]) for i in 1:3], 
           ["Vector{Abstract}","Vector{Any}", "Vector{Union{...}}"], 
    position = :lt, orientation = :vertical)

So basically just adding together the fields of different structs belonging to the same abstract type.

Getting the following results which speak for themselves:


The red line is the reference, i.e. time for concrete single type vector.

1 Like

It’s useful to distinguish two related concepts:

  • union splitting: specializing call sites when you know the type can only be one of a specific, finite list
  • what I have been calling “world splitting” (specifically the section starting at 6:48, or more narrowly starting at 10:40): specializing call sites for the methods that have been defined at the time of compilation

Keep in mind that the second is open-ended, whereas the former is closed. That’s a huge difference. While “open-ended” gives you greater flexibility and may be necessary in some cases, it opens you to much longer compile times and invalidation. Over time, Julia has gotten more restrictive: the world-splitting limit dropped from 4 types to 3 types, which is the reason your tall yellow bars above start once you have 4 methods. This is because the fewer methods you have to speculatively infer, the faster compilation becomes. (IIUC correctly, if you raise the limit to 5, no one has had the patience to wait for Julia to finish compiling itself, and it’s possible it never does.) While I understand that you might wish it were higher “for just this one function,” the reality is that these parameters have huge implications for the usability of Julia in the broader ecosystem. Several of us are quite excited about a change on master that will allow package authors to drop the limit even further (you can cut the TTFP for Plots in half by dropping the limit to 1).

The moral of the story: if you care about performance, you’re best off using real Union-splitting, rather than relying on world-splitting. In cases where you need to intervene, your best option is to add isa(x, T) blocks and do the dispatch manually. Your performance will be good here only if T is a concrete type, since abstract/partial types will invoke julia’s subtyping machinery, whereas comparing to a concrete type is done by pointers (i.e., very fast).

11 Likes

I am far from understanding where “world splitting” happens in your example, but changing

function mysum(mv::Vector{T}) where T
    sum([v.x for v in mv])
end

to

function mysum(mv::Vector{T}) where T
	s=0
	for i=1:length(mv)
		s+=mv[i].x
	end
    return s
end

Results in much better timings:

  59.99385162601627  # 1
  62.53760935910483  # 2
 392.1048792270531   # 3 
 392.31189320388347  # 4 types in the union
 393.2404761904762   # 5
 391.7910280373831   # 7
 391.6691304347826   # 8

And I would also call the second version “acting on the data”.

1 Like

I was puzzled why union splitting doesn’t work here…

Following the Julia rule “if in doubt write your own loop” I rewrote mysum as

function mysum(mv)
    s=0
    for i=1:length(mv)
        s+=mv[i].x
    end
    s
end

and get (note the logscale of the y-axis):

In your function somewhere – may be in the array comprehension – Julia seems to fall back to “world splitting” instead of “union splitting”.

1 Like

But…

function mysum(mv)
    s=0
    for v ∈ mv
        s+=v.x
    end
    s
end

and

function mysum(mv)
    s=0
    foreach(mv) do v
        s+=v.x
    end
    s
end