I have a project where I need to preallocate a Vector of Vectors. I could just use a multidimensional array, but this complicates things and makes the code less readable as well. Is there a way that I can preallocate a Vector of Vectors as efficiently as a simple multidimensional array? MWE:
#Matrix{Float64}
function makearr1(T, Nrow, Ncol)
return Matrix{T}(undef, Nrow, Ncol)
end
#Vectors of Vectors{Float64}
function makearr2(T, Nrow, Ncol)
return [Vector{T}(undef, Nrow) for _ in 1:Ncol]
end
Nrow = 1000
Ncol = 500
makearr1(Float64, Nrow, Ncol)
makearr2(Float64, Nrow, Ncol)
using BenchmarkTools
@btime makearr1($Float64, $Nrow, $Ncol) #3.629 ÎĽs (2 allocations: 3.81 MiB)
@btime makearr2($Float64, $Nrow, $Ncol) #163.800 ÎĽs (1003 allocations: 3.89 MiB)
It has roughly the same memory consumption, but a lot more computation time and allocations as I need to initialize each individual array. Is there a straightforward solution to have the same number of allocations for makearr2 as for makearr1?
Not really, because you want to do separate allocations. Also, if it takes the same time and the same memory, is the number of allocations that important?
Sorry for the confusion - I just swapped the intialization from zeros to an undefined Vector of arbitrary type T. It is quite a big performance drop in this case unfortunately.
You could allocate the matrix and use views (of the columns) of the matrix in your computations.
(or try to preallocate these vectors only once, I assume these allocations are ocurring repeatedly for now).
I thought about the views alternative as well, but this would result in a different type (Subarray instead of standard Vectors) for the Vector of Vector versions. I wish there was some form of reshape!() for that kind of problem where I only initialize all placeholder once and then fill them.
Usually, this will be only preallocated once, but in specific examples this might change, e.g. in a times series Ncol/Nrow might grow over time and I do not know the size a priori.
basically you store the data contiguously in-memory, and when you set/get index, you just “view”, this should save a lot of memory in the limit of having a lot of inner vectors
You want something more flexible than a multidimensional array provides. With individual vectors you can grow the vectors separately for example. And as (almost) always, you have to pay for flexibility with performance.
In the actual code, I define functions on the inner array - usually there are already @views involved, and I also want to be sure that the type of the inner array matches the original type for safey reasons.
I will have a look at ArraysOfArrays.jl, and otherwise I just take the additional time as flexibility costs.
It is. The reason it doesn’t get much more concise is because you want to express that you have a container of containers, with each container being
distinct
able to grow/shrink freely without impacting other vectors
A multidimensional matrix doesn’t have those restrictions, so it’s layout/construction can be simpler. Down in the weeds this means that a matrix can just be a single big chunk of memory, because each row/column has the same size as any other, whereas a vector of vectors has disjoint blocks of memory for each vector, to allow each object to live independently.
Just to add that, in these cases, use push! to increase the arrays (do not reallocate the whole stuff). push! can be quite efficient, because the vectors are saved with some extra space and pushing new data only needs to do reallocations eventually.
julia> function makearr3(T, Nrow, Ncol)
w = Vector{Vector{T}}(undef, Ncol)
for i in 1:Ncol
w[i] = Vector{T}(undef, Nrow)
end
return w
end
makearr3 (generic function with 1 method)
julia> @btime makearr3(Float64, 500, 1000);
395.621 ÎĽs (1491 allocations: 3.98 MiB)
julia> function makearr2(T, Nrow, Ncol)
[ Vector{T}(undef, Ncol) for _ in 1:Nrow ]
end
makearr2 (generic function with 1 method)
julia> @btime makearr2(Float64, 500, 1000);
191.369 ÎĽs (1000 allocations: 3.89 MiB)
What does improve the performance is to allow the function to specialize for the type of element being created:
julia> function makearr4(::Val{T}, Nrow, Ncol) where T
[ Vector{T}(undef, Ncol) for _ in 1:Nrow ]
end
makearr4 (generic function with 1 method)
julia> @btime makearr4(Val(Float64), 500, 1000);
81.485 ÎĽs (501 allocations: 3.88 MiB)