Suggestion for enhanced `sizehint!`

sgaure · August 11, 2024, 11:51am

I have a couple of times had the following problem. An algorithm which uses a vector v and does some push! and some pushfirst!. I know in advance the maximum number of pushfirst!, and the maximum number of push! that might possibly be needed. Moreover, this is a critical part of the code, and I don’t want reallocations. I can of course make a large vector, and manipulate two indices into it to handle this more manually.

In 1.11 there is a first argument to sizehint! which makes sure there is some room before the start of the vector to accomodate pushfirst! without reallocation. However, I need both push! and pushfirst!. In 1.11 with Memory-backed vectors, it’s quite easy to manipulate v.ref to move the start of the vector to a specific position in the backing Memory. This is what’s done with first=true. However this is undocumented.

It isn’t very hard to support arbitrary room before and after the vector inside sizehint!, but before I make a PR, does anyone have opinions on the API? My idea is to use the existing first argument, but with an integer where to start. I.e. something like sizehint!(v, 50, first=20). This is a bit unfortunate since true is and Integer equal to 1, but first=1 and first=true would be different.

Alternatively, one could implement a new function for direct placement of the start of the vector inside the Memory.

abraemer · August 11, 2024, 4:14pm

Sounds like a good idea.
I propose as signature

sizehint!(v, total, spaceatfront=0)

Where total gives the total number of elements that should fit inside v (current behavior) and spaceatfront (or perhaps some shorter name) gives how many elements should fit in front of the current elements within v.
IIRC, then the current first=true places all new space in front? That would be equal to sizehint!(v, total, total-length(v)) and is perhaps a bit wordy but I don’t mind being explicit with this I think.

Thought process:

There are 2 bits of information necessary: How much space there should be in front and how much space there should be in the back
The current design however only considers the total amount of space
So imo the most natural extension of the current (documented) behavior is the one I proposed.

nsajko · August 11, 2024, 4:46pm

The sizehint! function isn’t specific to vectors, though. Other collections, e.g., Dict, also support it. IMO it would be messy to have many keyword argument that only make sense for some classes of collections, but not for others.

Furthermore, I think it should be possible to implement the desired feature without expanding Julia’s interfaces. Something like this:

julia> v = Vector{Float64}(undef, 4)  # allocate
4-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0

julia> v = empty!(v)
Float64[]

julia> v = sizehint!(v, 1, first = true, shrink = false)  # now one element should be reserved in the front
Float64[]

sgaure · August 11, 2024, 8:16pm

I hadn’t thought of that. A good point. On the other hand, the docstring of sizehint! talks about “space before the start of the collection”, not limiting it to vectors. And it already has different keyword arguments for different collections. It’s not a great leap to allow a first::Union{Bool,Integer} to specify how many before the start of the collection.

While it could be possible to somehow shoehorn it in with the existing syntax, I fear this will make the keywords’ meanings dependent on the values of the other arguments, it could easily be confusing.

sgaure · August 12, 2024, 7:23am

I like this 3 argument version. With a default value for the third argument it defaults to the current variant. However, I think it’s too late to change the 1.11 sizehint! which has got a first keyword argument (it’s already at rc2). So the first keyword argument must be present. And it will be confusing together with the spaceatfront.

However, if we drop the default value for spaceatfront, we can drop the first keyword for this method. That is, for a Vector there are two sizehint! methods, one with two arguments, and one with three arguments:

sizehint!(v::Vector, sz; first, shrink) = ...
sizehint!(v::Vector, sz, spaceatfront; shrink) = ...

tecosaur · August 12, 2024, 9:09am

Overloading first to be a Union{Int, Bool} seems reasonable to me, I feel like the intuitive guess for what’s implied matches the semantics Simen talks about fairly well: it’s either all/none in front, or n in front.

Topic		Replies	Views
Do we have a "sizeforce!"? Performance	2	243	February 12, 2023
Improving my mental model: Why is `sizehint!()` so much slower than `zeros()`? Performance	9	1527	March 31, 2022
`sizehint!` style approach for matrices General Usage performance	2	312	November 8, 2022
Documenting performance model of `empty!`, `sizehint!`, `push!` & friends Internals & Design proposal	14	3701	May 2, 2019
Resize! and empty! immediately, why? General Usage	7	686	January 4, 2023

Suggestion for enhanced `sizehint!`

Related topics