Parallel data loading to GPU arrays

kolia · June 1, 2018, 3:49am

Anyone have pointers to loading data into CuArrays with background tasks while the main thread or process is busy training a Flux model?

Flux seems to have the same if not more flexibility than pytorch, so they fall in the same niche, and I would like to try out Flux instead for my next project.

However my experience is that for the kinds of loosely structured data that such flexibility helps the most with, loading and packaging the data for the model to consume quickly becomes the bottleneck. pytorch provides multiprocess DataLoaders for this. How would I do the same in the Julia ecosystem?

ChrisRackauckas · June 1, 2018, 4:11am

Does @async not do it?

stratospark · January 30, 2019, 1:52am

Wondering if there are any good beginner tutorials about this?

I’m coming from the Tensorflow/Keras world, where we can easily handle larger than memory datasets with Dataset iterators.

Super excited about the possibilities of Julia/Flux, but having trouble making the jump from MNIST-sized toy datasets to large ones.

kolia · January 30, 2019, 3:07am

There is this QueuePool referenced from Flux PR #450.

Haven’t tried it, don’t know if it’s in a working state, and don’t have pointers to example usage, maybe @staticfloat does?

Topic		Replies	Views
PyTorch DataLoader equivalent for training large models with Flux Machine Learning flux	16	4095	November 8, 2020
Training with Flux.jl on the GPU causes ArgumentError: cannot take the CPU address of a CuArray GPU question , gpu , flux , machine-learning , neural-network	4	1099	May 28, 2022
Data-parallel training with conv nets in Julia Machine Learning distributed	4	1005	July 20, 2018
Flux: GPU not working as expected Machine Learning flux	6	2191	July 28, 2020
Tips for handling large Datasets with a lot of preprocessing Machine Learning question , gpu , data	1	124	July 27, 2024

Parallel data loading to GPU arrays

Related topics