I have a stream of values, from which I collect blocks of values depending on some criterion which is calculated online, so I don’t know the size of a block until the last element. Then I process each block.
Since I know the type of values, I would like to pre-allocate a buffer while collecting to minimize allocations. Apparently resize!(buffer, 0)
and push!(buffer, elt)
result in very few allocations for the MWE below, the question is
- whether this is the right idiom, or if there is a better one,
- can I rely on the memory for
buffer
not beinggc
d between two blocks?
MWE (heavily simplified, actual problem is more complex):
mutable struct Source
i::Int
end
Source() = Source(0)
get_element(s::Source) = s.i += 1
Base.eltype(::Type{Source}) = Int
function get_block!(buffer, source)
resize!(buffer, 0)
for _ in 1:rand(5:10)
push!(buffer, get_element(source))
end
end
process_block(buffer) = sum(abs, buffer)
function process_blocks(source, n)
buffer = Vector{eltype(source)}()
sum((get_block!(buffer, source); process_block(buffer)) for _ in 1:n)
end
then
julia> @time process_blocks(Source(), 100000);
0.014218 seconds (12 allocations: 608 bytes)