Define empty array with sizehint in single step

Is there a stdlib function to generate an empty array of a given type with a specific sizehint at construction time?

Sometimes, push! based filling of arrays is just easier to write and read, even if you know exactly how many elements its going to have. It feels like this should be possible with two allocations, the best I could come up with is

using Chairmarks
function foo(T, N)
    a = Vector{T}(undef, N)
    empty!(a)
end
function bar(T, N)
    a = T[]
    sizehint!(a, N)
end
@b foo(Float64, 1000)
# 101.643 ns (3 allocs: 7.875 KiB)
@b bar(Float64, 1000)
# 109.647 ns (3 allocs: 7.875 KiB)

Well maybe my examples are not good, because those functions lead to 3 allocations even without emptying. The question about the oneliner still stands though.

The number of allocations can be a bit misleading, since the process is a bit more complex than immediately obvious (and I don’t know how to explain it either). Try with a smaller number of elements, and you will see fewer allocations:

julia> @btime bar(Float64, 100);
  34.646 ns (2 allocations: 928 bytes)

julia> @btime foo(Float64, 100);
  23.766 ns (2 allocations: 928 bytes)

julia> @btime Vector{Float64}(undef, 100);
  23.263 ns (2 allocations: 928 bytes)

As you can see, foo matches the speed of a simple vector allocation, so is probably your best bet. It doesn’t seem very ‘elegant’, though, it would be nice to have an idiomatic one-step procedure, but you can get what you want.

As for the number of allocations, observe:

julia> @btime Vector{Float64}(undef, 0);
  8.400 ns (1 allocation: 32 bytes)

julia> @btime Vector{Float64}(undef, 1);
  18.474 ns (2 allocations: 64 bytes)

julia> @btime Vector{Float64}(undef, 251);
  44.262 ns (2 allocations: 2.02 KiB)

julia> @btime Vector{Float64}(undef, 252);
  105.436 ns (3 allocations: 2.06 KiB)
1 Like

Ah, yes, thanks; the allocations depend on the size of the array.
It is kind of interesting how bar works at all with so few allocations. Naively, I’d expect that size hint! needs to allocate a larger portion of memory than what was available before.
Maybe some day we’ll get something like Vector{Float64}(undef, 0; sizehint=10).

I use that one.

1 Like

Note that both empty! and sizehint! return the collection they alter, so your one-liner could be

a = empty!(Vector{T}(undef, N))
2 Likes

I do feel a bit uneasy about empty!(Vector{T}(undef, N)), since the docstring simply says

Remove all elements from a collection.

It makes me think it might be GCed, or that the capacity is somehow not guaranteed. sizehint! seems a bit more deliberate.

I like the term ‘capacity’ for this, which is what C++ uses, and which can be directly accessed. I wonder why the vague term sizehint! was used, instead of capacity!, and why I can’t read it back from the array. Maybe something about leaving the final decision to the compiler?

Without direct memory management, sizehint! is more reasonable than capacity. There are performance relevant concerns like memory alignment etc.
The ‘best’ allocated capacity is not necessarily the requested one, but may be a little bit bigger.

Julia 1.11 introduced a lower level abstraction of memory than array: Julia 1.11 Highlights
As I read it, a more elaborate array struct was preferred first, but after years of usage, it became clear than users can and want do a bit of memory management themselfs (if needed).

1 Like