Vectors with elements of same type but different parametric type

DanielVandH · October 3, 2022, 10:00pm

I need to work with a set of points that might each have a different type. Consider as a rough example:

abstract type AbstractLabel end
struct LabelA <: AbstractLabel end
struct LabelB <: AbstractLabel end
struct LabelC <: AbstractLabel end
struct PointWithLabel{L<:AbstractLabel} 
    xy::Vector{Float64}
    label::Type{L}
    PointWithLabel(x::Float64, y::Float64, label::Type{L}) where {L} = new{L}([x, y], label)
    PointWithLabel(x, y, label) = PointWithLabel(Float64(x), Float64(y), label)
    PointWithLabel{L}(x, y) where {L} = PointWithLabel(x, y, L)
end
const PointA = PointWithLabel{LabelA}
const PointB = PointWithLabel{LabelB}
const PointC = PointWithLabel{LabelC}

I could then define for example the vector

p1 = PointA(0.5, 0.3)
p2 = PointB(0.17, -0.3)
p3 = PointC(0.0, 0.0)
p4 = PointB(0.2, 0.5)
vec1 = [p1, p2, p3, p4]

4-element Vector{PointWithLabel}:
 PointA([0.5, 0.3], LabelA)
 PointB([0.17, -0.3], LabelB)
 PointC([0.0, 0.0], LabelC)
 PointB([0.2, 0.5], LabelB)

Alternatively, I could define the vector as

vec2 = Vector{PointWithLabel{L} where L}([p1, p2, p3, p4])

4-element Vector{PointWithLabel{L} where L}:
 PointA([0.5, 0.3], LabelA)
 PointB([0.17, -0.3], LabelB)
 PointC([0.0, 0.0], LabelC)
 PointB([0.2, 0.5], LabelB)

Which of these vectors would be better to work with for performance? I’ll be e.g. solving differential equations with many thousands of these types of points, and using the labels in the type of the point to exploit multiple dispatch for defining new addition formulas (e.g. +(::PointA, ::PointB) and +(::PointC, ::PointA)). I initially thought vec1 would be better since the element type is not concrete, !isconcretetype(PointWithLabel), although vec2 requires me to define methods specific for the type PointWithLabel{L} where L. Alternatively, I could work with a Vector{Union{PointA, PointB, PointC}}, but I remember from somewhere (I think) that unions are not always performant.

torrance · October 4, 2022, 6:22am

In practice, I believe vec1 and vec2 are the same: they’re essentially just vectors of pointers to your Point{A|B|C} off in the heap. On the other hand, a vector of type Union{PointA, PointB, PointC} can store your types inline, at least if it is an isbits type.

But let us benchmark using BenchmarkTools:

# Vector of concrete type
pointAs = [PointA(xy...) for xy in eachcol(rand(2, 1_000_000))];
typeof(pointAs) == Vector{PointA}
@btime foreach(p -> p.label, pointAs)
# 241.399 ns (0 allocations: 0 bytes)

# Vector of non-concrete type
pointABCs = [rand([PointA, PointB, PointC])(xy...) for xy in eachcol(rand(2, 1_000_000))];
typeof(pointABCs) == Vector{PointWithLabel}
@btime foreach(p -> p.label, pointABCs)
# 225.922 μs (0 allocations: 0 bytes)

# Vector of union of concrete types
unionPoints = Array{Union{PointA, PointB, PointC}}(undef, 1_000_000)
unionPoints .= pointABCs
@btime foreach(p -> p.label, unionPoints)
# 5.742 ms (0 allocations: 0 bytes)

These results are quite surprising to me! Since your type isn’t isbits, I would have expected these to perform all about the same, with some small overhead involved handling types with second and third examples. Clearly, that overhead is large, since the second benchark x1000 slower than the first. And I have no explanation for the further 10x decrease in performance observed for the third benchmark.

EDIT: Switch benchmarks from map to foreach, since map is dominated by allocations.

torrance · October 4, 2022, 6:37am

I see a large improvement if I convert your type to isbits. Note that you do not need to store the type as a field for your struct, since it’s already stored in the type itself. Additionally, if xy is guaranteed to be just 2 Float64 in size, then a vector is a poor choice, and I think an SVector from StaticArrays package would be better to avoid the tiny heap allocation In this example I just use a tuple:

struct PointWithLabel{L<:AbstractLabel}
    xy::Tuple{Float64, Float64}
end

function getlabel(::PointWithLabel{L}) where L
    return L
end

Then benchmarking as before:

@btime foreach(getlabel, pointAs);
# 10.590 ns (0 allocations: 0 bytes)

@btime foreach(getlabel, pointABCs);
# 28.406 ms (0 allocations: 0 bytes)

@btime foreach(getlabel, unionPoints)
# 14.096 μs (0 allocations: 0 bytes)

In the case of the first and third benchmarks, we are accessing values stored directly inline to the array. And so these are much faster than before. Clearly, however, since the 3rd example is still 1000x slower than the first, there’s a lot of overhead involved with resolution of types in unions. Meanwhile, the middle example is stored as an array of pointers, and that pointer indirection is clearly very expensive.

DanielVandH · October 4, 2022, 7:10am

Thanks for these benchmarks! Very interesting. For your replies, should “ns” be “μs”? I assume so since you say the results are very similar. I get similar results too those:

686.300 μs
410.500 μs
6.434 ms

(in order of your first reply). Very interesting.

For your last reply: I think I’m going to have to do a lot of mutation on the elements of xy, so I don’t think I could treat it as immutable even though indeed all vectors are two-vectors. e.g. I’d be doing a lot of xy .+= something else. Unless you had any suggestions there?

torrance · October 4, 2022, 7:15am

You won’t be able to mutate, but if the overall struct remains relatively small, you can trivially create a new struct with the updated internal values. For example, with StaticArrays, even though they are immutable they still achieve the same semantics:

a = SVector{Float64, 4}(1, 2, 3, 4)
b = SVector{Float64, 4}(1, 2, 3, 4)
a += b # => a = a + b

In the last line, however, a isn’t mutated it is reassigned.

DanielVandH · October 4, 2022, 7:19am

Thanks! I’ll look into that approach. I also just found out about MVectors which I’ll look into. Appreciate your help.

Topic		Replies	Views
Effect of typing in structures Performance type , parametric-types , struct	4	165	June 7, 2024
Is operating a vector that have abstract type really slow? General Usage	11	768	November 29, 2019
Avoiding Vectors of Abstract Types Performance question , type-stability	22	4433	February 17, 2022
Parametric Structs New to Julia performance , parametric-types	10	273	July 23, 2025
Can the overhead of `myT{<:T}` compared to `myT{T}`, where `T` is a concrete type, be avoided? Performance parametric-types , union	3	150	November 17, 2024

Vectors with elements of same type but different parametric type

Related topics