Uploading vector of vectors to GPU in flux.jl

Hello all,

I am training a many-to-many RNN. My input and output data are both Vector{Vector{Vector{Float32}}}. The jld file of the input and output is about 3GB. I checked the model-zoo and did not find anything example on uploading RNN sequences to the GPU. I tried to upload the whole thing by using data_train = zip(us_train, ys_train) |> gpu. This ends up using all my GPU memory. Should I convert the input to a tensor of some sort? What is the best way to upload those sequence data to the GPU?

The best way to do so is in minibatches, just like you would for any other data or framework. Unlike other layers, however, this will be a vector of arrays to represent the sequence dimension. In other words, each minibatch will look like [(features x batch size) x sequence length]. Recurrence · Flux talks all about what kinds of data the built-in RNNs can work with.

1 Like

I have played around with DataLoader and it takes a [features x sequence length x batch size] tensor.

The following snippet can run but the loss is way off. I will keep playing around and see what’s wrong.

train_loader = Flux.DataLoader((data=us_train, label=ys_train), batchsize=5)

for (x, y) in train_loader
    x = x |> gpu
    y = y |> gpu

    Flux.train!(loss, ps, zip(x, y), opt, cb=evalcb)

x and y are already batched, so zipping them feeds through samples 1 at a time. Flux.train! is not really flexible enough for this, so I’d recommend using a custom training loop so that you’re able to transform the data into an RNN-compatible from before feeding it to the model.

This is the training function I ended up having.

function seq_batch_train!(loss, ps, data, opt; cb = () -> ())
    local training_loss
    cb = Flux.Optimise.runall(cb)

    x, y = data
    x = x |> gpu
    y = y |> gpu

    gs = Flux.gradient(ps) do
        training_loss = loss(x, y)
        return training_loss
    Flux.Optimise.update!(opt, ps, gs)

And here is the training loop with epoches.

for epoch in 1:1
    for batch in train_loader
        seq_batch_train!(loss, ps, batch, opt, cb = evalcb)
    @save "model_$(now())_epoch-$epoch.bson" m opt

The trick is in the loss function:

function loss(x, y)
    y_pred = [(m(x[:, xi, :]) - y[:, xi, :]).^2 for xi in axes(x, 2)] |> sum |> sum
    y_pred / length(x)

hcat() is very slow on GPU and therefore the MSE is manually calculated instead of using mse() from Flux.

Currently batch size of 10 uses about 8GB of GPU memory. I can see some tweaks to get it work with longer time series.