my understanding was, in vague terms, that pre-allocating an array by specifying size and then populating the elements of that array was more efficient than defining an array without doing so. To test this I just benchmarked examples of these two methods:
function no_preallocation()
mat = []
for i in 1:1000
push!(mat, rand(1000))
end
end
Note also that you are comparing two totally different data structures (an 1d array of 1d arrays ≠ a 2d array). And by including rand(1000) calls in your benchmark, a big chunk of the timing is due to generating random numbers.
I think you mean Array{Vector{Float64}}(undef, 1000)
function no_preallocation()
mat = [] # allocates Vector{Any}
for i in 1:1000 # 1000 times
# push! reallocates mat for some resizes
# rand(1000) allocates 1000-element Vector{Float64}
push!(mat, rand(1000))
end
end
# at least 1001 allocations, your @benchmark shows 1006
function preallocation()
mat = Array{Float64}(undef, 1000, 1000) # self-explanatory
for i in 1:1000 # 1000 times
mat[i, :] = rand(1000) # allocates 1000-element Vector{Float64}
end
end
# at least 1001 allocations, your @benchmark shows 1002
So you didn’t significantly save allocations overall, you trimmed a few resizing allocations in push!. As others have said, your data structures and element types also differ, so even putting the similar allocation count aside, it is not a fair comparison for performance either.
Also worth mentioning that rand(1000) has gone from 1 allocation in v1.10.5 to 3 allocations in v1.11.0, albeit with less memory overall, due to the changes to Array’s implementation.
I see, I think the different data structures effect was causing the issue. Once I corrected for this the preallocation benchmark improved the alternative. I thought they where the same structure. Thanks a lot!