Save complex state in function and reuse on next function call

DorianT · August 17, 2021, 9:49am

I have the following function makebatch that creates batches for my model from a text file. The file is processed line-by-line and sometimes processing a line will lead the output to exceed the desired batch size. In this case I want to reuse the output the next time I call makebatch. The output of the function is a vector of integers with length BATCH_SIZE.

Currently I have a working version which will return the finished batch and the part to be reused for the next batch as a tuple. I see that I could also declare a global next batch which will then be used at the function call. However, I think this will come with a performance penalty(?) and it is also kind of ugly.

I saw In Julia, how to create a function that saves its own internal state? but I am not exactly sure how to apply this because the state I save is somewhat more complex (again, just like the output a vector of integers but not as long as BATCH_SIZE).

Very grateful for any tips on how to handle this!

function makebatch(IO::IOStream, VOCAB::Vocabulary, BATCH_SIZE::Int, next_batch::Vector{Int} = Vector{Int}())
    batch = next_batch
    while length(batch) < BATCH_SIZE
        line = readline(IO)
        idcs = words2idcs(line, VOCAB)
        isnothing(idcs) ? continue : push!(batch,idcs...)
    end

    if length(batch) > BATCH_SIZE
        next_batch = Vector{Int}()
        while length(batch) > BATCH_SIZE
            push!(next_batch, pop!(batch))
        end
    end
    return(batch, next_batch)
end

SteffenPL · August 17, 2021, 10:06am

Reading the post, I’m not sure what your goal is. Do you want faster code, or is your goal a more elegant implementation?
Your current design looks already quite straightforward.
(As you said, using a global here is probably worse.)

If you have more data to pass on to the next iteration, you could also consider using NamedTuple (and UnPack.jl).
Alternativly, maybe something like python’s yield would be useful, e.g. https://schlichtanders.github.io/Continuables.jl/dev/manual/#Example-of-a-Continuable

DorianT · August 17, 2021, 10:17am

Sorry if I was a bit unclear. I want the former (faster code) and was hoping to get the latter (more elegant implementation) along the way.

I’m taking a look at both of your suggestions now and will make sure to report back

SteffenPL · August 17, 2021, 10:51am

Just a small thing. I think you can replace

    if length(batch) > BATCH_SIZE
        next_batch = Vector{Int}()
        while length(batch) > BATCH_SIZE
            push!(next_batch, pop!(batch))
        end
    end

with

next_batch = batch[BATCH_SIZE+1:end]
resize!(batch, BATCH_SIZE)

That’s at least shorter maybe even faster.

DorianT · August 18, 2021, 4:32pm

Maybe as a follow up for future reference:

I have now stuck with the implementation of exporting next_batch as a global and putting next_batch as a function argument, when calling the function repeatedly. The performance seems good enough for my use case and I don’t lose type stability anyway because the state saved in the global variable is passed back as a function argument and not as a global.

(Btw, !resize was just as fast as the original approach for the small batch sizes that I use and tested with, but it is a little more concise which I liked)

Topic		Replies	Views
In Julia, how to create a function that saves its own internal state? New to Julia question , function	10	2745	April 7, 2021
How to make a function store data to avoid repeating computation General Usage question , functions , style , memoize	45	1770	October 17, 2022
Want to generate samples with all items being sampled the same number of times? New to Julia	4	401	March 20, 2021
Strategies to reuse memory New to Julia question , memory-allocation , functions	9	1550	February 15, 2022
Help in understanding efficiently handling multidimensional arrays New to Julia	15	1795	January 19, 2019

Save complex state in function and reuse on next function call

Related topics