Preallocating Vector of Vectors

mrVeng · October 6, 2021, 1:22pm

Hi there,

I have a project where I need to preallocate a Vector of Vectors. I could just use a multidimensional array, but this complicates things and makes the code less readable as well. Is there a way that I can preallocate a Vector of Vectors as efficiently as a simple multidimensional array? MWE:

#Matrix{Float64}
function makearr1(T, Nrow, Ncol)
        return Matrix{T}(undef, Nrow, Ncol)
end
#Vectors of Vectors{Float64}
function makearr2(T, Nrow, Ncol)
        return [Vector{T}(undef, Nrow) for _ in 1:Ncol]
end
Nrow = 1000
Ncol = 500
makearr1(Float64, Nrow, Ncol)
makearr2(Float64, Nrow, Ncol)

using BenchmarkTools
@btime makearr1($Float64, $Nrow, $Ncol) #3.629 μs (2 allocations: 3.81 MiB)
@btime makearr2($Float64, $Nrow, $Ncol) #163.800 μs (1003 allocations: 3.89 MiB)

It has roughly the same memory consumption, but a lot more computation time and allocations as I need to initialize each individual array. Is there a straightforward solution to have the same number of allocations for makearr2 as for makearr1?

Best regards

kristoffer.carlsson · October 6, 2021, 1:28pm

Not really, because you want to do separate allocations. Also, if it takes the same time and the same memory, is the number of allocations that important?

mrVeng · October 6, 2021, 1:38pm

Sorry for the confusion - I just swapped the intialization from zeros to an undefined Vector of arbitrary type T. It is quite a big performance drop in this case unfortunately.

lmiq · October 6, 2021, 1:56pm

You could allocate the matrix and use views (of the columns) of the matrix in your computations.
(or try to preallocate these vectors only once, I assume these allocations are ocurring repeatedly for now).

mrVeng · October 6, 2021, 2:46pm

Thank you for the input!

I thought about the views alternative as well, but this would result in a different type (Subarray instead of standard Vectors) for the Vector of Vector versions. I wish there was some form of reshape!() for that kind of problem where I only initialize all placeholder once and then fill them.

Usually, this will be only preallocated once, but in specific examples this might change, e.g. in a times series Ncol/Nrow might grow over time and I do not know the size a priori.

rafael.guerra · October 6, 2021, 3:18pm

Would the following work?

function makearr2b(T, Nrow, Ncol)
    v = Vector{T}(undef, Nrow)
    w = Vector{Vector{T}}(undef, Ncol)
    w .= Ref(v)
end

NB: no it does not, as per @mrVeng post below

jling · October 6, 2021, 3:31pm

if you know the total size ahead of time, you can use:
https://github.com/JuliaArrays/ArraysOfArrays.jl

basically you store the data contiguously in-memory, and when you set/get index, you just “view”, this should save a lot of memory in the limit of having a lot of inner vectors

kristoffer.carlsson · October 6, 2021, 3:42pm

You want something more flexible than a multidimensional array provides. With individual vectors you can grow the vectors separately for example. And as (almost) always, you have to pay for flexibility with performance.

Why is that an issue?

mrVeng · October 6, 2021, 4:48pm

Thank you! Unfortunately, this causes a pointer issue:

c = makearr3(Float64, Nrow, Ncol) #Vector{Vector{Float64}} with 1000 elements
c[1][1] #NaN
c[2][1] #NaN
c[1][1] = 123. #123.0
c[2][1] #123.0, not NaN

Sukera · October 6, 2021, 4:53pm

What about

function makearr2(T, Nrow, Ncol)
    [ Vector{T}(undef, Ncol) for _ in 1:Nrow ]
end

? That doesn’t have the pitfall of using the exact same (===) vector for all indices.

I have to say, I’m also curious - why are views not ok?

mrVeng · October 6, 2021, 5:06pm

This is the same function as the the second function in my post, right? Apologies if I oversee something.

mrVeng · October 6, 2021, 5:10pm

In the actual code, I define functions on the inner array - usually there are already @views involved, and I also want to be sure that the type of the inner array matches the original type for safey reasons.

I will have a look at ArraysOfArrays.jl, and otherwise I just take the additional time as flexibility costs.

Sukera · October 6, 2021, 5:14pm

It is. The reason it doesn’t get much more concise is because you want to express that you have a container of containers, with each container being

distinct
able to grow/shrink freely without impacting other vectors

A multidimensional matrix doesn’t have those restrictions, so it’s layout/construction can be simpler. Down in the weeds this means that a matrix can just be a single big chunk of memory, because each row/column has the same size as any other, whereas a vector of vectors has disjoint blocks of memory for each vector, to allow each object to live independently.

mrVeng · October 6, 2021, 5:19pm

Thanks! I think I should copy-paste your comment to the relevant struct field so I never forget why I chose that specific Array format :D.

lmiq · October 6, 2021, 5:57pm

Just to add that, in these cases, use push! to increase the arrays (do not reallocate the whole stuff). push! can be quite efficient, because the vectors are saved with some extra space and pushing new data only needs to do reallocations eventually.

julia> x = [[1,2],[3,4]];

julia> @allocated push!(x[1],3)
48

julia> @allocated push!(x[1],3)
0

julia> @allocated push!(x[1],3)
80

julia> @allocated push!(x[1],3)
0

julia> @allocated push!(x[1],3)
0

julia> @allocated push!(x[1],3)
0

julia> @allocated push!(x[1],3)
144

julia> x
2-element Vector{Vector{Int64}}:
 [1, 2, 3, 3, 3, 3, 3, 3, 3]
 [3, 4]

rafael.guerra · October 6, 2021, 6:05pm

Thanks
If not mistaken, writing the loop instead of the comprehension seems to be faster & allocate less (Julia 1.7):

function makearr3(T, Nrow, Ncol)
    w = Vector{Vector{T}}(undef, Ncol)
    for i in 1:Ncol
        w[i] =  Vector{T}(undef, Nrow)
    end
    return w
end

lmiq · October 6, 2021, 6:48pm

I don’t see that:

julia> function makearr3(T, Nrow, Ncol)
           w = Vector{Vector{T}}(undef, Ncol)
           for i in 1:Ncol
               w[i] =  Vector{T}(undef, Nrow)
           end
           return w
       end
makearr3 (generic function with 1 method)

julia> @btime makearr3(Float64, 500, 1000);
  395.621 μs (1491 allocations: 3.98 MiB)

julia> function makearr2(T, Nrow, Ncol)
           [ Vector{T}(undef, Ncol) for _ in 1:Nrow ]
       end
makearr2 (generic function with 1 method)

julia> @btime makearr2(Float64, 500, 1000);
  191.369 μs (1000 allocations: 3.89 MiB)

What does improve the performance is to allow the function to specialize for the type of element being created:

julia> function makearr4(::Val{T}, Nrow, Ncol) where T
           [ Vector{T}(undef, Ncol) for _ in 1:Nrow ]
       end
makearr4 (generic function with 1 method)

julia> @btime makearr4(Val(Float64), 500, 1000);
  81.485 μs (501 allocations: 3.88 MiB)

rafael.guerra · October 6, 2021, 6:58pm

Odd.
Here running Julia 1.7 on Win10 laptop:

using BenchmarkTools
@btime makearr1(Float64, $Nrow, $Ncol) #   3.9 μs (2 allocations: 3.81 MiB)
@btime makearr2(Float64, $Nrow, $Ncol) # 181.3 μs (1003 allocations: 3.89 MiB)
@btime makearr3(Float64, $Nrow, $Ncol) #  59.8 μs (501 allocations: 3.88 MiB)

Jeff_Emanuel · October 6, 2021, 7:03pm

makearr2 and makearr3 are not equivalent. One creates Nrow vectors of length Ncol. The other is switched.

There are two different makearr2’s in this thread.

lmiq · October 6, 2021, 7:03pm

Interesting. Here, with 1.7, the loop version is improved (considering the correction in makearray3 pointed above):


julia> @btime makearr2(Float64, 500, 1000);
  195.416 μs (1000 allocations: 3.89 MiB)

julia> @btime makearr3(Float64, 500, 1000);
  83.967 μs (501 allocations: 3.88 MiB)

julia> @btime makearr4(Val(Float64), 500, 1000);
  85.357 μs (501 allocations: 3.88 MiB)

In Julia 1.6 I have:

julia> @btime makearr3(Float64, 500, 1000);
  189.542 μs (1001 allocations: 3.89 MiB)

Thus, the gain associated to the specialization in 1.6 is given in 1.7 for some reason.

Topic		Replies	Views
Pre-allocating array efficiency Performance	5	250	October 12, 2024
Pre-allocation for matrices / vectors of matrices of different length General Usage	3	161	April 4, 2023
Preallocating array of matrices New to Julia question	4	1076	May 5, 2020
Preallocate an array with the element type, which is unknow before running? New to Julia	2	261	August 2, 2022
Removing undesirable allocations in some functions Performance performance , memory-allocation	15	403	January 27, 2024

Preallocating Vector of Vectors

Related topics