There has recently been many threads talking about buffered data loaders for machine learning workflows, e.g. https://discourse.julialang.org/t/pytorch-dataloader-equivalent-for-training-large-models-with-flux/.
I have been writing a few different solutions for this throughout the last 6 months, and have found that a rather low level is what I prefer. I made the package LengthChannels.jl which is a modest extension of
Base.Channel that endows the channel with a length for iteration. This allows the channel to be used effectively as a dataset by Flux etc.
The README contains an example of a buffered batch iterator for Flux CNNs.
The package requires julia v1.3 and allows the population of the channel to take place on a separate thread. This way, data loading and potential pre-processing can be done by the CPU while the GPU is training your model. The data can not be put on the GPU by the channel as one is not allowed to mess with the GPU by a separate thread.
I also made the package DiskDataProviders.jl which is a bit more complicated and may be useful when reading the data from disk takes longer time than a single training step. It was created with Flux in mind and is quite a lot less flexible than a