Accessing a vector of vector is slow

This topic: How is memory allocated for a vector of structs? - General Usage / Performance - Julia Programming Language (julialang.org)

made me wonder if a vector of vectors would be faster than a vector of struct for this case where the struct only contains Int. In fact it was much slower, but is there a better way to write double! and double_array! ?

using BenchmarkTools

mutable struct MutFoo
    a::Int
    b::Int
end

function double_b!(v::Vector{MutFoo})
    for w in v
        w.b *= 2
    end
    v
end

struct ImmutFoo
    a::Int
    b::Int
end

function double_b!(v::Vector{ImmutFoo})
    for i in eachindex(v)
        # @show i,v[i].a,v[i].b
        v[i] = ImmutFoo(v[i].a, v[i].b * 2)
    end
    return
end

function double!(v::Vector{Vector{Int64}},size::Int64)
    @inbounds for i in 1:size
        # @show i,v[i].a,v[i].b
        v[i][2] *= 2
    end
    return
end
function double_array!(v::Array{Int64,2},size::Int64)
    @inbounds for i in 1:size
        # @show i,v[i].a,v[i].b
        v[2,i] *= 2
    end
    # v[2,1:size] .= @views v[2,1:size]  * 2
    return
end

size = 10^6
println("Benchmarking vector of mutable objects")
const mut_vec = [MutFoo(1, 1) for i in 1:size]
# @show mut_vec
@btime double_b!(mut_vec)
# @show mut_vec

println("Benchmarking vector of immutable objects")
const immut_vec = [ImmutFoo(0, 0) for i in 1:size]
@btime double_b!(immut_vec)

println("Benchmarking vector of vectors")
vec = Vector{Vector{Int64}}(undef,size)
for i in eachindex(vec)
    vec[i] = Vector{Int64}(undef,2)
    vec[i][1:2] .= 1,1
end
@btime double!(vec,size)


println("Benchmarking array")
arr = Array{Int64,2}(undef,2,size)
for (i,v) in enumerate(eachcol(arr))
    arr[:,i] .= 1,1
end
@btime double_array!(arr,size)

with the following results:

Benchmarking vector of mutable objects
WARNING: redefinition of constant Main.mut_vec. This may fail, cause incorrect answers, or produce other errors.
  1.715 ms (0 allocations: 0 bytes)
Benchmarking vector of immutable objects
WARNING: redefinition of constant Main.immut_vec. This may fail, cause incorrect answers, or produce other errors.
  183.700 μs (0 allocations: 0 bytes)
Benchmarking vector of vectors
  5.547 ms (0 allocations: 0 bytes)
Benchmarking array
  179.200 μs (0 allocations: 0 bytes)

It seems to me that this could be a good application for StructArrays.jl

This gives for me:

Setup code
julia> using BenchmarkTools, StructArrays
julia> mutable struct MutFoo
           a::Int
           b::Int
       end

julia> function double_b!(v::Vector{MutFoo})
           for w in v
               w.b *= 2
           end
           v
       end
double_b! (generic function with 1 method)

julia> struct ImmutFoo
           a::Int
           b::Int
       end
## MutFoo
julia> @btime double_b!(mut_vec) setup=(mut_vec = [MutFoo(1, 1) for i in 1:10^6]);
 2.862 ms (0 allocations: 0 bytes)

## StructArray{ImmutFoo}
julia> function double_b!(v::StructArray{ImmutFoo})
           v.b .*= 2
       end
julia> @btime double_b!(sa_vec) setup=(sa_vec=StructArray([ImmutFoo(1, 1) for i in 1:10^6]));
  535.211 μs (0 allocations: 0 bytes)

## plain Array
julia> function double_b!(v::Array)
           @views v[:,2] .*= 2
       end
julia> @btime double_b!(arr) setup=(arr=ones(10^6,2));
  560.075 μs (0 allocations: 0 bytes)

Sorry for the confusion, but the array or vector of struct was for the linked post. This post concerns only the vector of vectors in double!.

You cannot be faster than with a plain Array. A vector of vectors will certainly always be slower since you need to do another whole memory lookup/fetch per element that the CPU cannot predict.

My original application was implementing an AVL tree, and then the struct is the tree node which contains the key, data, height, and links to left and right children. The tree would contain many nodes, hence the need for a fast implementation.

I tried an implementation using arrays for each field of the node, but it was slower than the one using structs. I guess this is due to cache locality.

Sounds reasonable but I wonder if it is true for larger arrays vs. structs? I does seem odd that a vector of vectors is so much slower though. Would it be the same in C?

A vector of mutable struct is much like a vector of Vector. The benefit to the struct is that that the compiler knows exactly how big it is so doesn’t need to boundscheck accesses to it, but this may or may not result in a significant performance change.

EDIT: I would need to look more carefully, but it’s probable that a Vector{Vector{T}} would actually have two additional levels of indirection, relative to a Vector{MutFoo}., because a Vector is a mutable object that contains a reference to a block of memory that holds the data. So it might actually be more detrimental than I had previously indicated.

A vector of (immutable) struct is more like a Vector of Int, in that the values are stored directly in the array (rather than storing a pointer to each mutable member). This saves a level of indirection on access, which results in fewer memory accesses and better cache locality. Further, it can move many operations that would be forced to happen in RAM (for a mutable container like mutable struct or Vector) into registers, avoiding system memory entirely for some operations.

In general, I find that maybe 90% of my structs are immutable. If you don’t need to change an object in one place and have it also changed in some other place, you probably don’t need a mutable. So containers are mostly mutable, but many “individual” objects get little value from mutability. Immutables usually have performance benefits, but (more importantly, to me) I find them to much easier to work with and reason about. Accessors.jl can ease the tedium of writing code to “modify” immutables.

StructArrays.jl is offers yet-another layout type. It is particularly valuable if you are often accessing some small part of every object (e.g., taking the average of a field across all array elements) rather than accessing several parts of an individual object together.

1 Like