I observe a factor of 5x time difference in writing over a vector with two different preallocation strategies.
Using x = zeros(Int64, n)
Using x = Int64[]; resize!(x,n)
The benchmark code is bellow
using BenchmarkTools
function test_access!(x,n)
@inbounds for i = 1 : n
x[i] = i
end
end
function init(n)
x = Int64[]
resize!( x , n )
end
n = 10_000_000;
@btime test_access!(x,$n) setup=(x = zeros(Int64,$n))
5.025 ms (0 allocations: 0 bytes)
@btime test_access!(x2,$n) setup=(x2 = init($n))
26.381 ms (0 allocations: 0 bytes)
I was expecting these codes to have similar performance. Any ideas on why there is such a big difference?
zeros(Int64, n) does initialize the Vector with zeros, Int64[]; resize!(x,n) only ask for the memory (if you print it before overwriting you will see garbage). Probably just for the setup the resize! solution is faster.
The difference in access performance you are seeing probably comes from zeros having already materialized and brought the Vector to the L1 cache, while accessing the positions in the Vector expanded by resize will cause a page fault (i.e., the OS was lazily waiting for you to access any position to only then really allocate the memory), and then some cache misses of bringing the Vector slowly into the L1 cache.
using BenchmarkTools
function test_access!(x,n)
@inbounds for i = 1 : n
x[i] = i
end
end
function init1(n)
x = Int64[]
resize!( x , n )
test_access!( x, n )
return x
end
function init2(n)
x = zeros(Int64, n)
test_access!( x, n )
return x
end
function init3(n)
x = Vector{Int64}(undef, n)
test_access!( x, n )
return x
end
const n = 10_000_000;
@btime init1($n)
@btime init2($n)
@btime init3($n)
shows the expected
13.780 ms (2 allocations: 76.29 MiB)
18.666 ms (2 allocations: 76.29 MiB)
15.528 ms (2 allocations: 76.29 MiB)
I agree with your explanation. One strategy is lazy allocation, the other is eager. In the lazy case, the pages are not allocated by the OS until needed, hence the penalty when visiting them for the first time.
I believe Vector{Int}(undef, n) can only be the same or faster. However, if we look at the number of allocations I am not sure if x = Int[]; resize!(x, n) does not end up being the same; because, maybe, x = Int[] does not allocate a memory space in the heap until you start putting things inside it or call resize! (i.e., it strongly relies on the asymptotic O(1) behaviour of push!).
If I remember correctly, the minimum size that is always allocated is 32. The resize! has to do an additional call into the C runtime, after allocating the array. The undef version only does a single one, so you’ll always have at least that as a difference.