Data augmentation question (CIFAR10)

Hi,
I am training a CNN on CIFAR10 (via Flux.DataLoader) and am trying to apply some stochastic data transformations to each batch of data at runtime.
My goal is to apply the following operations to each batch I get from the dataloader.

  1. pad with zeros (yielding 32x32 → 40x40 images).
  2. Random crop (yielding 40x40 → 32x32 images).
  3. Flip images horizontally with 50% probability.

Is there a best practice for achieving this in Julia?
Augmentor.jl seems like it might be the way to go, but it requires me to convert between dataformats, and does not seem to have an option for padding.
I found an example on the dev version of Augmentor.jl’s docs (link), which relies on MappedArrays instead of Flux.DataLoader. This seems less readable and as far as I can tell data is not shuffled at every epoch.

Using Julia and Flux has generally been a breeze so far, with custom layers and learning rules being very simple to implement. So I was quite surprised that implementing standard data augmentation seems to take much more effort.

I have tried to implement a minimal working example shown below. The interesting parts are probably the function MWE and getdata. I did not find a good way to implement padding (and to random crop to size 32x32 I need padding), so the only transformation applied at the moment is FlipX(0.5).

I guess my questions are:

  • Am I on the right track with using Augmentor.jl or are there better options?
  • If Augmentor.jl is the way to go, then how could I implement the padding and random cropping?
  • Do you have general ideas on how to make things cleaner/faster? For larger networks my current approach slows things down a bit.
using Augmentor, MLDatasets
using Flux, Flux.Optimise
using Flux: onehotbatch, onecold
using Flux.Losses: logitcrossentropy

function getdata(batchsize)
    xtrain, ytrain = MLDatasets.CIFAR10.traindata(Float32)
    xtest, ytest = MLDatasets.CIFAR10.testdata(Float32)

    m = reshape([0.4914009f0 0.4914009f0 0.4465309f0], (1,1,3,1))
    s = reshape([0.20230277f0 0.19941312f0 0.2009607f0], (1,1,3,1))
    xtrain = (xtrain .- m) ./ s
    xtest = (xtest .- m) ./ s

    # Convert training data to RGB to work with augmentbatch!()
    xtrain = MLDatasets.CIFAR10._colorview(RGB, permutedims(xtrain, (3, 1, 2, 4)))
    ytrain, ytest = Flux.onehotbatch(ytrain, 0:9), Flux.onehotbatch(ytest, 0:9)

    trainloader = Flux.DataLoader((xtrain, ytrain), batchsize=batchsize, shuffle=true, partial=false)
    testloader = Flux.DataLoader((xtest, ytest), batchsize=batchsize, partial=false)

    return (trainloader, testloader)
end

function LeNet5(; imgsize=(28,28,1), nclasses=10) 
    out_conv_size = (imgsize[1]÷4 - 3, imgsize[2]÷4 - 3, 16)
    return Chain(
            Conv((5, 5), imgsize[end]=>6, relu),
            MaxPool((2, 2)),
            Conv((5, 5), 6=>16, relu),
            MaxPool((2, 2)),
            flatten,
            Dense(prod(out_conv_size), 120, relu), 
            Dense(120, 84, relu), 
            Dense(84, nclasses)
          )
end

function loss_and_accuracy(data_loader, net, device)
    acc = 0.0f0; ls = 0.0f0; num = 0
    for (x, y) in data_loader
        x, y = x |> device, y |> device
        pred = net(x)
        ls += logitcrossentropy(pred, y)
        acc += sum(onecold(cpu(pred)) .== onecold(cpu(y)))
        num +=  size(y, 2)
    end
    return ls / num, acc / num
end

function MWE()
    pl = FlipX(0.5) |> SplitChannels() |> PermuteDims((2, 3, 1))
    device = gpu
    batchsize = 128
    trainloader, testloader = getdata(batchsize)
    opt = ADAM(0.0001)
    net = LeNet5(imgsize=(32, 32, 3), nclasses=10) 
    net = net |> device
    ps = Flux.params(net)
    for epoch=1:5
        for (x, y) in trainloader
            xaug = zeros(Float32, 32, 32, 3, batchsize)
            augmentbatch!(xaug, x, pl)
            xaug, y = xaug |> device, y |> device
            gs = gradient(ps) do
                l = logitcrossentropy(net(xaug), y)
            end
            update!(opt, ps, gs)
        end
        test_loss, test_acc = loss_and_accuracy(testloader, net, device)
        @info """Epoch: $epoch:
        Test:     Acc(θ): $(round(test_acc*100f0, digits=2))%    Loss: $(round(test_loss, digits=6))
        """
    end
end

MWE()

For cropping, I’d use Augmentor.Crop as well. You’d only have to implement the randomization of the indices.

I did not know padding was considered an image augmentation. It’s usually done by the Conv layer using the keyword argument pad = 4.

There’s a difference between the two approaches, yours would yield image patches with pad corners, while using the Conv padding creates a “square” of zeros around your cropped patches. I think the latter is the standard approach. But I might be wrong.

You should try GitHub - lorenzoh/DataAugmentation.jl: Flexible data augmentation library for machine and deep learning

1 Like

Thanks for the suggestions:)

@HenriDeh
The motivation for padding at the augmentation stage is that the randomly cropped images then will have size 32x32 (like the original data), but the network will see each image slightly differently displaced at each epoch.
This is a common and effective augmentation technique, which for example is described at the top of page 8 in this paper.
In pytorch this is achieved by applying RandomCrop with keyword padding.
Augmentor does have a function RCropSize, but no option for padding. I guess a slightly clumbsy solution could be to add padding to the training data when creating the dataloaders.

@CarloLucibello
Thanks for the suggestion. I initially discounted this package as it looked a bit less stable than Augmentor.jl, and I couldn’t find out how to use it in conjunction with batched data (where augmentor has Augmentor.augmentbatch!()).
Could you elaborate on why you would recommend this package over Augmentor.jl?
Also all the examples I could find are for individual images. Is there a function for applying the same transform to a batch of images?

I’ve never used DataAugmentation.jl myself so I can’t give much advice, but it’s part of the larger GitHub - FluxML/FastAI.jl: Repository of best practices for deep learning in Julia, inspired by fastai project, so it’s being actively developed (while Augmentor.jl is essentially in maintenance mode AFAIK) and it is geared towards deep learning needs

You could file an issue to DataAugmentation.jl reporting specific needs or lack of documentation, I think it would be useful

You’re welcome. In that case you could open an issue in Augmentor.jl to ask for the feature. Or better, if you feel up to it, you could implement it and make a pull request ! It’s certainly a good idea to get some practice in julia programming.

Thanks :slight_smile:
Reading issues in both repos have given me some ideas on how I could bake transformations into a custom dataloader, which should help make my code cleaner. I will look more into DataAugmentation.jl and try to figure out if it is better for my use case than Augmentor.jl.

Great, let us know the solution you end up with, it would be nice to add augmentation to the model zoo script model-zoo/vgg_cifar10.jl at master · FluxML/model-zoo · GitHub

Fwiw, I have used PaddedViews in conjunction with Augmentors RCropSize for this type of augmentation.

Iterator for reference
struct AugIter{A,B}
    aug::A
    base::B
end

function Base.iterate(itr::AugIter)
    valstate = iterate(itr.base)
    valstate === nothing && return nothing
    val, state = valstate
    buffer = initbuffer(val, itr.aug)
    return featureaug(val, buffer, itr.aug), (buffer, state)
end

initbuffer((x,y)::Tuple, aug) = initbuffer(x, aug)
function initbuffer(val::AbstractArray, aug) 
    img1 = augment(val[:,:,1], aug)
    return similar(img1, size(img1)..., size(val)[end])
end

function Base.iterate(itr::AugIter, (buffer, state))
    valstate = iterate(itr.base, state)
    valstate === nothing && return nothing
    val, state = valstate
    return featureaug(val, buffer, itr.aug), (buffer, state)
end

featureaug((x,y)::Tuple, buffer, aug) = featureaug(x, buffer, aug), y
function featureaug(val, buffer, aug)
    nobs_val = size(val)[end]
    nobs_buf = size(buffer)[end]

    bview = if  nobs_val < nobs_buf 
        selectdim(buffer, ndims(buffer), 1:nobs_val)
    elseif nobs_val > nobs_buf
        similar(buffer, size(buffer)[1:end-1]..., size(val)[end])
    else
        buffer
    end
    return img2arr(augmentbatch!(CPUThreads(), bview, val, aug))
end

img2arr((x,y)::Tuple) = img2arr(x), y
function img2arr(img::AbstractArray)
    chwn = ImageCore.channelview(img)
    return PermutedDimsArray(chwn, (3,2,1,4))
end

Created like this:

 AugIter(FlipX(0.5) |> RCropSize(32, 32), paddedbatchiter)

Where paddedbatchiter is an interator returning tuple of batches of a PaddedView of the training data and the corresponding labels (e.g. Fluxs DataLoader).