Why does a vector with 10 times more elements takes 2x-5x less time to pre-allocate?

Vector{Int}(undef, 1000) is basically only allocating on 1.10 (calling jl_alloc_array_1d and thus malloc), and since not initializing the speed should be independent of size(?):

julia> @code_lowered Vector{Int}(undef, 1000)
CodeInfo(
1 ─ %1 = Core.cconvert(Core.Int, m)
│   %2 = Core.apply_type(Core.Array, $(Expr(:static_parameter, 1)), 1)
│   %3 = Core.unsafe_convert(Core.Int, %1)
│   %4 = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{T, 1}, svec(Any, Int64), 0, :(:ccall), :(%2), :(%3), :(%1)))
└──      return %4
)

I believe the benchmarking inaccurate, allocating isn’t directly responsible for the GC activity (i.e. if you’re not runniing out of memory you shouldn’t have GC triggered, but it happens because of benchmarking in a loop), it is actually shown to be zero for min., but sometimes you get it for mean (and sometimes for median, but sometimes no GC activity), so I think it means cost of releasing the memory, and thus GC activity.

The min stays almost the same when allocating 10x (276.351 ns for me), while some other numbers go up, and then do go up with another 10x for 100x allocated.

However on 1.11 I see larger assembly with @code_native and different/larger code with:

@code_lowered Vector{Int}(undef, 1000)
CodeInfo(
1 ─ %1 = Core.fieldtype
│   %2 = Core.fieldtype(self, :ref)
│   %3 = (%1)(%2, :mem)
│   %4 = Core.undef
│        mem = (%3)(%4, m)
│   %6 = mem
│   %7 = Core.memoryref(%6)
│   %8 = Core.tuple(m)
│   %9 = %new(self, %7, %8)
└──      return %9

I think/thought malloc in general does not initialize:

I still think on Windows it woulldn’t initialize, but one caveat, is that first allocations need to come from the kernel (basically old memory from other processes), and on any OS, then it must initialize for security reasons.

I would thus trust min. numbers. when only allocating, i.e. when using undef (this does not apply to e.g. zeros, that does fill and is then of course linear in speed).

1 Like