Sizehint!, explanation of the naming?

I am new to Julia.

I am wondering why the function “sizehint!” is named so.

Many functions are named in a self-evident way. But I cannot make sense of it for “sizehint!”.

Could anyone give a hint?

2 Likes

Let me introduce you to the excellent help functionality of the REPL (that can be accessed typing a ? at the beginning of a line):

help?> sizehint!
search: sizehint!

  sizehint!(s, n) -> s


  Suggest that collection s reserve capacity for at least n elements. That is,
  if you expect that you're going to have to push a lot of values onto s, you
  can avoid the cost of incremental reallocation by doing it once up front;
  this can improve performance.

  See also resize!.

  Notes on the performance model
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡

  For types that support sizehint!,

    1. push! and append! methods generally may (but are not required to)
       preallocate extra storage. For types implemented in Base, they
       typically do, using a heuristic optimized for a general use case.

    2. sizehint! may control this preallocation. Again, it typically does
       this for types in Base.

    3. empty! is nearly costless (and O(1)) for types that support this
       kind of preallocation.

I think this explanation reads rather well and is clear to me. If there are things left unclear for please ask :slight_smile:
To make things more explicit maybe: sizehint! provides a hint to the container about the maximal size needed. This hint may be ignored but can yield performance improvements if the container type can make use of it.

11 Likes

Note also that the convention in Julia (inherited from Scheme and Lisp) is that you put an exclamation mark ! at the end of the name of a function if it modifies (“mutates”) its arguments. See also the Punctuation section of the Julia manual.

Here, sizehint! ends with an exclamation mark because it modifies its array argument (to change the amount of storage allocated).

6 Likes

This is a bit of an edge-case. You do that usually if you’re actually mutating the data in some way, but you are not with sizehint! in any user-visible way (except with threads?)? For the underlying storage, its size, not the observed length, there is mutation going on though.

What it does is call (I see no proof the “hint” can be ignored):

function sizehint!(a::Vector, sz::Integer)  # Note, it/Vector defined for dense 1-D "array" but NOT for a matrix or n-D)

and thus indirectly ccall(:jl_array_sizehint and I wanted to confirm what the C code does but I can’t find it! I’m pretty sure though that it will make sure the underlying storage memory is large enough for the n (actually named sz there in the code) of what you want. Most likely it will reallocate and copy your data round (unless you’re lucky and it’s not needed). Since all threads see the old data then it’s likely a possible issue, I assume the data is copied “left-to-right”, and you may have a problem if someone is mutation at the same time, at least in it’s done in the same direction. But you would have the same problem without sizehint!, i.e. with push!, it’s just a know problem with using threads.

I.e. Julia has no choice in the way to interpret the hint, it would not give it much “thought”, just enlarge to that size. [In one go rather than incrementally over time when push!ing to the array; though that would neither actually enlarge e.g. 1-by-1 (since Julia is clever).]

[push! actually mutates, because it not only enlarges the Vector, but also actually puts new values in; I see it does actually _growend!(a, 1) always, seemingly by 1 then, but immediately after, @_safeindex a[length(a)] = itemT I asume then placing 1 or more values at the end, so is it not using the fastest way when pushing more than 1 entry at a time?]

About the text “sizehint! may control this preallocation” I think that means for sure for Julia-defined Vector, but not any other <: AbstractVector. I imagine there’s a no-op still for that, because people can define many strange vector/array types, e.g. non-contentious.

I tried and you can NOT use it on regular dense Matrix/Array (so don’t try, thaugh plausibly it could be defined for them too, e.g. N x M matrix, maybe wouldn’t be too helpful?), but there’s actually one loophole:

julia> B = sparse(A)
1×2 SparseMatrixCSC{Int64, Int64} with 2 stored entries:
 1  2

julia> @edit sizehint!(B, 2)  # now I can do IT!

julia> @edit push!(B, 1)  # while not, nor for dense, since it doesn't make sense:
ERROR: no unique matching method found for the specified argument types

function Base.sizehint!(S::SparseMatrixCSC, n::Integer)
    nhint = min(n, widelength(S))
    sizehint!(getrowval(S), nhint)
    sizehint!(nonzeros(S),  nhint)
    return S
end

While a sparse array is implemented behind the scenes as two vectors, it’s done. That means while the array looks to you mostly empty, those vectors are not (and dense). This also means that I guess all user-defined sparse matrix types should define their own sizehint!? Or possibly rely on a no-op. But such dense arrays need not.

For generic code I would want to be able to use sizehint! not knowing are caring if the array is sparse or dense. The latter intentionally errors, while maybe should have been defined?

julia> sizehint!(A, 2)
ERROR: MethodError: no method matching sizehint!(::Matrix{Float64}, ::Int64)

You might have used a 2d row vector where a 1d column vector was required.
Note the difference between 1d column vector [1,2,3] and 2d row vector [1 2 3].
You can convert to a column vector with the vec() function.