Define empty array with sizehint in single step

hexaeder · November 26, 2024, 9:06am

Is there a stdlib function to generate an empty array of a given type with a specific sizehint at construction time?

Sometimes, push! based filling of arrays is just easier to write and read, even if you know exactly how many elements its going to have. It feels like this should be possible with two allocations, the best I could come up with is

using Chairmarks
function foo(T, N)
    a = Vector{T}(undef, N)
    empty!(a)
end
function bar(T, N)
    a = T[]
    sizehint!(a, N)
end
@b foo(Float64, 1000)
# 101.643 ns (3 allocs: 7.875 KiB)
@b bar(Float64, 1000)
# 109.647 ns (3 allocs: 7.875 KiB)

Well maybe my examples are not good, because those functions lead to 3 allocations even without emptying. The question about the oneliner still stands though.

DNF · November 26, 2024, 9:19am

The number of allocations can be a bit misleading, since the process is a bit more complex than immediately obvious (and I don’t know how to explain it either). Try with a smaller number of elements, and you will see fewer allocations:

julia> @btime bar(Float64, 100);
  34.646 ns (2 allocations: 928 bytes)

julia> @btime foo(Float64, 100);
  23.766 ns (2 allocations: 928 bytes)

julia> @btime Vector{Float64}(undef, 100);
  23.263 ns (2 allocations: 928 bytes)

As you can see, foo matches the speed of a simple vector allocation, so is probably your best bet. It doesn’t seem very ‘elegant’, though, it would be nice to have an idiomatic one-step procedure, but you can get what you want.

As for the number of allocations, observe:

julia> @btime Vector{Float64}(undef, 0);
  8.400 ns (1 allocation: 32 bytes)

julia> @btime Vector{Float64}(undef, 1);
  18.474 ns (2 allocations: 64 bytes)

julia> @btime Vector{Float64}(undef, 251);
  44.262 ns (2 allocations: 2.02 KiB)

julia> @btime Vector{Float64}(undef, 252);
  105.436 ns (3 allocations: 2.06 KiB)

hexaeder · November 26, 2024, 10:29am

Ah, yes, thanks; the allocations depend on the size of the array.
It is kind of interesting how bar works at all with so few allocations. Naively, I’d expect that size hint! needs to allocate a larger portion of memory than what was available before.
Maybe some day we’ll get something like Vector{Float64}(undef, 0; sizehint=10).

nsajko · November 26, 2024, 10:36am

I use that one.

barucden · November 26, 2024, 10:47am

Note that both empty! and sizehint! return the collection they alter, so your one-liner could be

a = empty!(Vector{T}(undef, N))

DNF · November 26, 2024, 11:53am

I do feel a bit uneasy about empty!(Vector{T}(undef, N)), since the docstring simply says

Remove all elements from a collection.

It makes me think it might be GCed, or that the capacity is somehow not guaranteed. sizehint! seems a bit more deliberate.

I like the term ‘capacity’ for this, which is what C++ uses, and which can be directly accessed. I wonder why the vague term sizehint! was used, instead of capacity!, and why I can’t read it back from the array. Maybe something about leaving the final decision to the compiler?

HSt · November 26, 2024, 3:12pm

Without direct memory management, sizehint! is more reasonable than capacity. There are performance relevant concerns like memory alignment etc.
The ‘best’ allocated capacity is not necessarily the requested one, but may be a little bit bigger.

Julia 1.11 introduced a lower level abstraction of memory than array: Julia 1.11 Highlights
As I read it, a more elaborate array struct was preferred first, but after years of usage, it became clear than users can and want do a bit of memory management themselfs (if needed).

Topic		Replies	Views
Resize! and empty! immediately, why? General Usage	7	697	January 4, 2023
Documenting performance model of `empty!`, `sizehint!`, `push!` & friends Internals & Design proposal	14	3716	May 2, 2019
Improving my mental model: Why is `sizehint!()` so much slower than `zeros()`? Performance	9	1549	March 31, 2022
I want to write a better performance array, don't know if there is a ready-made one? Performance	3	491	January 14, 2021
Reusing buffer arrays by providing them as arguments to functions Performance	5	1680	March 17, 2019

Define empty array with sizehint in single step

Related topics