Performance of resize! vs pre-allocating with zeros(...)

fcdimitr · April 29, 2022, 3:00pm

I observe a factor of 5x time difference in writing over a vector with two different preallocation strategies.

Using x = zeros(Int64, n)
Using x = Int64[]; resize!(x,n)

The benchmark code is bellow

using BenchmarkTools

function test_access!(x,n)
  @inbounds for i = 1 : n
    x[i] = i
  end
end

function init(n)
  x = Int64[]
  resize!( x , n )
end

n = 10_000_000;

@btime test_access!(x,$n) setup=(x = zeros(Int64,$n))
  5.025 ms (0 allocations: 0 bytes)

@btime test_access!(x2,$n) setup=(x2 = init($n))
  26.381 ms (0 allocations: 0 bytes)

I was expecting these codes to have similar performance. Any ideas on why there is such a big difference?

Oscar_Smith · April 29, 2022, 3:05pm

I can confirm this. It is very odd.

Henrique_Becker · April 29, 2022, 4:46pm

zeros(Int64, n) does initialize the Vector with zeros, Int64[]; resize!(x,n) only ask for the memory (if you print it before overwriting you will see garbage). Probably just for the setup the resize! solution is faster.

The difference in access performance you are seeing probably comes from zeros having already materialized and brought the Vector to the L1 cache, while accessing the positions in the Vector expanded by resize will cause a page fault (i.e., the OS was lazily waiting for you to access any position to only then really allocate the memory), and then some cache misses of bringing the Vector slowly into the L1 cache.

goerch · April 29, 2022, 5:29pm

Looks like @Henrique_Becker is right, the MWE

using BenchmarkTools

function test_access!(x,n)
  @inbounds for i = 1 : n
    x[i] = i
  end
end

function init1(n)
  x = Int64[]
  resize!( x , n )
  test_access!( x, n )
  return x
end

function init2(n)
    x = zeros(Int64, n)
    test_access!( x, n )
    return x
end

function init3(n)
    x = Vector{Int64}(undef, n)
    test_access!( x, n )
    return x
end

const n = 10_000_000;

@btime init1($n)
@btime init2($n)
@btime init3($n)

shows the expected

  13.780 ms (2 allocations: 76.29 MiB)
  18.666 ms (2 allocations: 76.29 MiB)
  15.528 ms (2 allocations: 76.29 MiB)

fcdimitr · April 29, 2022, 5:47pm

I agree with your explanation. One strategy is lazy allocation, the other is eager. In the lazy case, the pages are not allocated by the OS until needed, hence the penalty when visiting them for the first time.

Thanks!

Sukera · April 29, 2022, 6:04pm

The fastest should be Vector{Int}(undef, n), because it doesn’t have to allocate anew and doesn’t have to call into the runtime again.

Sukera · April 29, 2022, 6:05pm

Please refer to the following thread for an in-depth discussion about the (dis-)advantages of using calloc, as well as links to further discussions.

Henrique_Becker · April 29, 2022, 6:50pm

I believe Vector{Int}(undef, n) can only be the same or faster. However, if we look at the number of allocations I am not sure if x = Int[]; resize!(x, n) does not end up being the same; because, maybe, x = Int[] does not allocate a memory space in the heap until you start putting things inside it or call resize! (i.e., it strongly relies on the asymptotic O(1) behaviour of push!).

mkitti · April 29, 2022, 7:52pm

I started the registration process for GitHub - mkitti/ArrayAllocators.jl: Allocate arrays with malloc, calloc, or on NUMA nodes yesterday. If you wanted to explicitly initialize the array, you can use fill!. This uses the C call memset under the hood.

using ArrayAllocators

julia> @btime test_access!(x,$n) setup=(x = zeros(Int64,$n))
  8.457 ms (0 allocations: 0 bytes)

julia> @btime test_access!(x2,$n) setup=(x2 = init($n))
  17.474 ms (0 allocations: 0 bytes)

julia> @btime test_access!(x2,$n) setup=(x2 = init($n); fill!(x2,0))
  8.026 ms (0 allocations: 0 bytes)

julia> @btime test_access!(x2,$n) setup=(x2 = Array{Int64}(undef, $n))
  19.933 ms (0 allocations: 0 bytes)

julia> @btime test_access!(x2,$n) setup=(x2 = Array{Int64}(undef, $n); fill!(x2, 0))
  8.492 ms (0 allocations: 0 bytes)

julia> @btime test_access!(x2,$n) setup=(x2 = Array{Int64}(calloc, $n))
  16.426 ms (0 allocations: 0 bytes)

julia> @btime test_access!(x2,$n) setup=(x2 = Array{Int64}(calloc, $n); fill!(x2, 0))
  8.472 ms (0 allocations: 0 bytes)

Sukera · April 29, 2022, 8:28pm

If I remember correctly, the minimum size that is always allocated is 32. The resize! has to do an additional call into the C runtime, after allocating the array. The undef version only does a single one, so you’ll always have at least that as a difference.

Topic		Replies	Views
Why does a vector with 10 times more elements takes 2x-5x less time to pre-allocate? Performance question	14	370	October 30, 2024
Fastest way of getting a long zero vector? General Usage	29	854	March 15, 2024
Improving my mental model: Why is `sizehint!()` so much slower than `zeros()`? Performance	9	1554	March 31, 2022
Faster zeros with calloc Performance array	26	3900	March 23, 2022
@benchmark resize! : no memory allocation? General Usage benchmarktools	5	525	March 14, 2021

Performance of resize! vs pre-allocating with zeros(...)

Related topics