Is operating a vector that have abstract type really slow?

using BenchmarkTools
abstract type SuperType end
mutable struct TypeA <: SuperType
    a::Int64
    b::Int64
end
mutable struct TypeB <: SuperType
    a::Int64
    b::Float64
end

function Iter(objs::Vector{SuperType})
    for _ in 1:1000
        if rand() <= 0.5
            push!(objs,TypeA(1,2))
        else
            push!(objs,TypeB(2,2.0))
        end
    end
end
    
function Iter(objs::Vector{TypeA})
    for _ in 1:1000
        if rand() <= 0.5
            push!(objs,TypeA(1,2))
        else
            push!(objs,TypeA(2,2))
        end
    end
end

By defining an abstract type with two son types, I made two vectors, and tested the push!() speed.

abs = Vector{SuperType}()
@benchmark Iter(abs)

output:

BenchmarkTools.Trial: 
  memory estimate:  31.25 KiB
  allocs estimate:  1000
  --------------
  minimum time:     19.018 μs (0.00% GC)
  median time:      20.635 μs (0.00% GC)
  mean time:        904.554 μs (89.30% GC)
  maximum time:     2.039 s (100.00% GC)
  --------------
  samples:          5639
  evals/sample:     1

and

as = Vector{TypeA}()
@benchmark Iter(as)

output:

BenchmarkTools.Trial: 
  memory estimate:  31.25 KiB
  allocs estimate:  1000
  --------------
  minimum time:     19.122 μs (0.00% GC)
  median time:      20.183 μs (0.00% GC)
  mean time:        1.152 ms (93.90% GC)
  maximum time:     2.415 s (100.00% GC)
  --------------
  samples:          4986
  evals/sample:     1

It’s quite wired that operating the abstract type vector is not slower than the specific one. Am I doing anything wrong?

Is that because no matter what type of the vector is, what are stored are always pointers?

Do the timings change of the struct are not mutable?

My best guess is that this is inning and Union splitting. The first vector is probably being stored with roughly the equivalent of a c Union. That way, each item would just be 16 bytes + a bit for which type.

Maybe that is true. Is there any way to check how the array storing the variables? Pointers or variables themselves?

Here’s the results by changing structure type to immutable.

abs = Vector{SuperType}()
@benchmark Iter(abs)
BenchmarkTools.Trial: 
  memory estimate:  31.25 KiB
  allocs estimate:  1000
  --------------
  minimum time:     30.927 μs (0.00% GC)
  median time:      36.210 μs (0.00% GC)
  mean time:        1.405 ms (93.30% GC)
  maximum time:     3.320 s (100.00% GC)
  --------------
  samples:          5591
  evals/sample:     1
as = Vector{TypeA}()
@benchmark Iter(as)
BenchmarkTools.Trial: 
  memory estimate:  31.25 KiB
  allocs estimate:  1000
  --------------
  minimum time:     34.405 μs (0.00% GC)
  median time:      44.396 μs (0.00% GC)
  mean time:        1.034 ms (94.85% GC)
  maximum time:     2.630 s (100.00% GC)
  --------------
  samples:          5446
  evals/sample:     1

Seems making no much difference.

No this has nothing to do with union splitting.

The main reason is that you aren’t operating on the vector element. You just store to it so there’s no slow down from type instability.

When it’s mutable, there’s also no slow down from allocation since the pointer is stored in either case. When it’s immutable, there’s no slow down from allocation since the stored value is constant and the allocation is done at compile time.

I believe you should use Iter($abs) or Iter($as). I don’t think is matters much in this case though.

No!

Arrays never stores ā€œvariableā€. They always store the reference. However, if the eltype is bits type, the reference will be stored using the value instead of pointer.

2 Likes

Okay, thanks.
So, if the arrays always store reference, would that be slower to operating them in Julia than that in C++ when arrays stored by themselves?

Can I understand that the mutable structs are stored as pointers and the immutable ones are stored as themselves?

Depending on what you do. Storing the value is not necessarily more efficient. FWIW, this is the whole reason you need to worry about copying in C++… Also, storing by reference does not mean storing the pointer.

No. The isbits eltype (or field type) are. This has nothing to do with the value, only the declared field type.

1 Like

Hum… Seems I have a lot to learn. I just thought the pointer is reference. What’s their difference BTW? Or where can I study these knowledge? In fact I’m not major in computer science or programming.

By your indicate. I tested the real operating function:

function Operate(objs::Vector{T}) where {T <: SuperType}
    for obj in objs
        obj.a += 1
    end
end

The results are are significant:

@benchmark Operate(abs)
BenchmarkTools.Trial: 
  memory estimate:  54.69 KiB
  allocs estimate:  3500
  --------------
  minimum time:     105.313 μs (0.00% GC)
  median time:      109.868 μs (0.00% GC)
  mean time:        262.076 μs (10.10% GC)
  maximum time:     162.335 ms (99.92% GC)
  --------------
  samples:          10000
  evals/sample:     1

and

@benchmark Operate(as)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.145 μs (0.00% GC)
  median time:      1.228 μs (0.00% GC)
  mean time:        2.595 μs (0.00% GC)
  maximum time:     1.403 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10

Reference here just means it is refering to the object, not in the C++ reference sense.

Pointer is a more unambiguous low level concept. C++ reference are implemented as pointers.

1 Like