Gc and resize!

I have a stream of values, from which I collect blocks of values depending on some criterion which is calculated online, so I don’t know the size of a block until the last element. Then I process each block.

Since I know the type of values, I would like to pre-allocate a buffer while collecting to minimize allocations. Apparently resize!(buffer, 0) and push!(buffer, elt) result in very few allocations for the MWE below, the question is

  1. whether this is the right idiom, or if there is a better one,
  2. can I rely on the memory for buffer not being gcd between two blocks?

MWE (heavily simplified, actual problem is more complex):

mutable struct Source
    i::Int
end

Source() = Source(0)

get_element(s::Source) = s.i += 1

Base.eltype(::Type{Source}) = Int

function get_block!(buffer, source)
    resize!(buffer, 0)
    for _ in 1:rand(5:10)
        push!(buffer, get_element(source))
    end
end

process_block(buffer) = sum(abs, buffer)

function process_blocks(source, n)
    buffer = Vector{eltype(source)}()
    sum((get_block!(buffer, source); process_block(buffer)) for _ in 1:n)
end

then

julia> @time process_blocks(Source(), 100000);
  0.014218 seconds (12 allocations: 608 bytes)

I don’t see any memory being freed in the source called for resize!, so this seems like a good way to do it if you don’t mind relying on undocumented behavior. (If you depend on this, perhaps add proper documentation and a unit test that ensures that memory is not re-allocated, so that you’ll be notified if this behavior ever changes.)

Alternatively, at least in the example here, it seems like it’d be fairly easy to keep track of the buffer size yourself, and overwrite existing entries if there’s room, and grow it otherwise. You could create a simple wrapper type around a Vector to support this.

I am not sure how one could test for this without exposing internals.

Also, I am totally OK with the GC reallocating buffer occasionally, especially if it grew to big for a single occasion, as long as it does not happen 99% of the time. Which seems to be the case.

resize!(buffer, 0) does not free the buffer. If you want to free it, do resize!(buffer, 0); sizehint!(buffer, 0).

So your construction is exactly right.

If your buffer contains bitstypes and the collecting turns out to be a bottleneck, then consider not using push!. This is because push! is currently not inlining the fast path (enough capacity in the buffer underlying the Vector), and hence has an overhead of a handful cycles (call into the runtime). In that case, you can use e.g.

function get_block2!(buffer, source)
    cnt = 0
    if length(buffer)<10
        resize!(buffer, 10)
    end
    for elem in source
        cnt += 1
        if length(buffer)< cnt
             resize!(buffer, 2*length(buffer))
        end
        @inbounds buffer[cnt] = elem
    end
    return cnt
end

assuming that your source obeys the iteration protocol and does not know how many elements it will collect. If your don’t use bitstypes, then you could use

function get_block3!(buffer, source, capa)
    cnt = 0
    resize!(buffer, max(capa, 10))
    for elem in source
        cnt += 1
        if length(buffer)< cnt
             resize!(buffer, 2*capa)
             capa = 2*capa
        end
        @inbounds buffer[cnt] = elem
    end
    resize!(buffer, cnt)
    return capa
end

The initial resize is within capacity and only zeros the memory.

PS. Just for comparison:

julia> using BenchmarkTools
julia> buf=Float64[]; src=rand(10_000);
julia> function get_block!(buffer, source)
           resize!(buffer, 0)
           for elem in source
               push!(buffer, elem)
           end
           end
julia> @btime get_block!($buf, $src);
  69.774 μs (0 allocations: 0 bytes)

julia> @btime get_block2!($buf, $src);
  15.424 μs (0 allocations: 0 bytes)
1 Like

There is also the “PushVector” that does this but with a perhaps nicer interface:

https://github.com/JuliaLang/julia/issues/24909#issuecomment-419731925

1 Like

Neat. I wonder if it would make sense to factor out PushVector to a mini-package. I don’t know what the timeline is for fixing #24909.

Special handling of push! in codegen might be possible to basically emit the code the PushVector does (but of course using the capacity field in the jl_array struct.

1 Like

Ok, I really should try to make a PR to inline the push! fastpath; yuyichao patiently explained to me how to do it (surprisingly non-terrible, but I’m new to codegen). This problem crops up all the time.

That will, however, still be slower than get_block2!: cnt can be kept in register (which means that buffer will be in an indefinite state if you need to recover from an exception, because nobody remembered cnt) as opposed to updating a single int (PushVector) or two ints (forcibly inlined ccall(:jl_array_grow_end, ...) updates both length(buf) and size(buf,1) because, for some reason, both are stored in vectors). Also, it may require a little fiddling to be friendly for hoisting the flag-checks (somebody could unfairly use a reshape to make our buffer unresizable).

1 Like

Mock objects, or simply a specific test that resize! + push! doesn’t allocate, if that’s your concern.

It currently happens 0 % of the time (even for big buffers, if I’m reading the source correctly). My concern, from a long term maintainable code perspective, would rather be if the implementation ever changes in the future.

From the C source we can tell that this is the case, but the documentation doesn’t promise it. So can we rely on it? (If so, we should update the doc.)

This is precisely what I had in mind with “a simple wrapper”, thanks for saving me the typing :slight_smile: Although, in lieu of finish! I was thinking:

unsafe_wrap(Array, pointer(buf), length)

(or is this bad practice?)

yes :slight_smile:

May I ask why, if you control the buffer? I’ve seen it used in several places in the source for Julia itself.

In my code the PushVector itself allocates the buffer so we do not control it. Even if we control it the compiler can optimize things away and unless we have a GC.@preserve it is hard to know the lifetime of buffer. Anyway, it is bad practice to use unsafe_ if you don’t need it, doesn’t seem like it is needed here.

1 Like

That would be great.