[ANN] LengthChannels - Buffered iterators for machine learning

baggepinnen · November 27, 2019, 9:28am

There has recently been many threads talking about buffered data loaders for machine learning workflows, e.g. https://discourse.julialang.org/t/pytorch-dataloader-equivalent-for-training-large-models-with-flux/.
I have been writing a few different solutions for this throughout the last 6 months, and have found that a rather low level is what I prefer. I made the package LengthChannels.jl which is a modest extension of Base.Channel that endows the channel with a length for iteration. This allows the channel to be used effectively as a dataset by Flux etc.

The README contains an example of a buffered batch iterator for Flux CNNs.

The package requires julia v1.3 and allows the population of the channel to take place on a separate thread. This way, data loading and potential pre-processing can be done by the CPU while the GPU is training your model. The data can not be put on the GPU by the channel as one is not allowed to mess with the GPU by a separate thread.

I also made the package DiskDataProviders.jl which is a bit more complicated and may be useful when reading the data from disk takes longer time than a single training step. It was created with Flux in mind and is quite a lot less flexible than a LengthChannel.

zgornel · November 28, 2019, 4:13pm

This looks awesome, cheers.

DoktorMike · November 28, 2019, 4:42pm

Nice work!

Topic		Replies	Views
PyTorch DataLoader equivalent for training large models with Flux Machine Learning flux	16	4095	November 8, 2020
[ANN] DataLoaders.jl (alpha) - basically PyTorch's parallel `DataLoader` Package Announcements flux , machine-learning	0	891	March 14, 2020
Multi-threaded producer - consumer with threads for leading data Machine Learning question	16	1352	November 13, 2019
Parallel data loading to GPU arrays Machine Learning gpu , parallel , gpuarrays , data , flux	3	1106	January 30, 2019
Training FLUX models with larger datasets Machine Learning cuda , flux	4	1485	April 7, 2022

[ANN] LengthChannels - Buffered iterators for machine learning

Related topics