I have the following function makebatch
that creates batches for my model from a text file. The file is processed line-by-line and sometimes processing a line will lead the output to exceed the desired batch size. In this case I want to reuse the output the next time I call makebatch
. The output of the function is a vector of integers with length BATCH_SIZE.
Currently I have a working version which will return the finished batch and the part to be reused for the next batch as a tuple. I see that I could also declare a global next batch
which will then be used at the function call. However, I think this will come with a performance penalty(?) and it is also kind of ugly.
I saw In Julia, how to create a function that saves its own internal state? but I am not exactly sure how to apply this because the state I save is somewhat more complex (again, just like the output a vector of integers but not as long as BATCH_SIZE).
Very grateful for any tips on how to handle this!
function makebatch(IO::IOStream, VOCAB::Vocabulary, BATCH_SIZE::Int, next_batch::Vector{Int} = Vector{Int}())
batch = next_batch
while length(batch) < BATCH_SIZE
line = readline(IO)
idcs = words2idcs(line, VOCAB)
isnothing(idcs) ? continue : push!(batch,idcs...)
end
if length(batch) > BATCH_SIZE
next_batch = Vector{Int}()
while length(batch) > BATCH_SIZE
push!(next_batch, pop!(batch))
end
end
return(batch, next_batch)
end