Training FLUX models with larger datasets

lgmendes · April 5, 2022, 3:29pm

Hi,

I’m trying to train a FLUX model (ResNet50) with a medium-size dataset (~ 30K images (512X512X3)) on a computer with 128GB of RAM and Three NVIDIA GeForce GTX 1080 Ti.

I’m loading the data with Dataloader using a BatchSize=32. When I try to process all → it runs out of memory. Even with a subset of the data, easily goes out of memory or the memory does not fit in the GPU
Any advice on how to handle the problem?

Thanks in advance!

The training Loop is something like:

for epoch in 1:args.epochs
        @showprogress for (x, y) in train_loader
            x, y = x |> device, y |> device
            gs = Flux.gradient(ps) do
                    ŷ = model(x)
                    loss(ŷ, y)
                end
            Flux.Optimise.update!(opt, ps, gs)
 end

The inputs of the DataLoader are

octDfTraningImagesResize=map(x->imresize(RGB.(x),(imageResizeDim,imageResizeDim))|> channelview.|> Float32,octDfTraningImages)
 octDfTraningImagesResizeVect=map(x->permutedims(x,(3, 2, 1)),octDfTraningImagesResize)

lgmendes · April 5, 2022, 10:33pm

I found a package of a DataLoader implementation (https://github.com/lorenzoh/DataLoaders.jl) that works like PyTorch’s DataLoader which seems to solve the main problem!

Rasmus_Hoier · April 6, 2022, 7:07am

MLUtils.jl might be interesting for you as well. I think the DataLoader in MLUtils is derived from the one in DataLoaders.jl.
MLUtils documentation

lgmendes · April 6, 2022, 12:07pm

Thanks!

From the DataLoaders documentation (https://github.com/lorenzoh/DataLoaders.jl/blob/master/docs/datacontainers.md) it is easy to create a dataloader object with the links and the methods to (lazy) read the images and the labels. However, how can I shuffle the data? the shuffleobs function does not work with “customized” dataTypes DataLoaders.DataLoader(shuffleobs(data), 16) not work

shuffleobs(data) = shuffleobs(Random.GLOBAL_RNG, data)
function shuffleobs(rng::AbstractRNG, data)
    obsview(data, randperm(rng, numobs(data)))
end

Any suggestion?

Possible practical solution may be , for each epoch, create/update the train_loader with a shuffle version of the links to the images. Since we are working only with links this operation should be fast. A more elegant solution may be possible and/or already implemented.

The example code is:

import DataLoaders.LearnBase: getobs, nobs
using Images

struct ImageDataset
    files::Vector{String}
end
ImageDataset(folder::String) = ImageDataset(readdir(folder))

nobs(data::ImageDataset) = length(data.files)
getobs(data::ImageDataset, i::Int) = Images.load(data.files[i])


data = ImageDataset("path/to/my/images")
for images in DataLoader(data, 16)
    # Do something
end

ToucheSir · April 7, 2022, 1:45am

In cases like this, the custom dataset type is absolutely the way to go. If you’re looking for additional inspiration there, check out what FastAI.jl does for its image datasets.

Topic		Replies	Views
PyTorch DataLoader equivalent for training large models with Flux Machine Learning flux	16	4096	November 8, 2020
Flux's model-zoo CIFAR10 example saturates 8GB gpu General Usage gpu , flux	5	652	June 29, 2020
Flux + GPU memory problems Machine Learning flux	2	819	April 26, 2022
Flux: Hard to use train! and DataLoader for minibatched NamedTuple dataset with GPU Machine Learning flux	2	1434	September 24, 2020
Flux: GPU not working as expected Machine Learning flux	6	2191	July 28, 2020

Training FLUX models with larger datasets

Related topics