Flux: Hard to use train! and DataLoader for minibatched NamedTuple dataset with GPU

paalon · September 24, 2020, 7:37am

I want to write the following neural networks machine learning code on GPU but I need to write complicated and dirty something within dataloader’s for loop to pass data to train!:

# This GPU based program works but very slow than CPU

using Flux
using Flux.Data: DataLoader
using Printf

xpu = gpu
# xpu = cpu

n = 100
dataset = (
    input = (
        data1 = rand(1, n),
        data2 = rand(1, n),
        data3 = rand(1, n),
        data4 = rand(5, 10, n)
    ),
    output = rand(1, n)
) |> xpu
dataloader = DataLoader(dataset.input, dataset.output, batchsize = 4)
model = Chain(
    (input) -> cat(dims=1, input.data1, input.data2), # Here I ignore data3 and data4 for simplicity. In real model, I'll use these data.
    Dense(2, 10, relu),
    Dense(10, 1)
) |> xpu

loss(input, output) = Flux.mse(model(input), output)
optimizer = ADAM()
epoch_length = 100
for epoch in 1:epoch_length
    for (input, output) in dataloader
        # complicated and maybe slow, I need to reconstruct minibatched dataset
        data = [((data1 = input.data1[:, i], data2 = input.data2[:, i], data3 = input.data3[:, i], data4 = input.data4[:, :, i]), output[:, i]) for i in size(output)[2]]
        Flux.train!(loss, params(model), data, optimizer)
    end
    loss(dataset.input, dataset.output) |> println
end

This code is very slower than CPU. How should I do to run it fast? What is the good practice?

nuclear718 · September 24, 2020, 8:26am

If the batchsize is too small, it is hard to gain performance from GPU.

paalon · September 24, 2020, 8:54am

The batch size is small and the complexity of the model is too small, and it seems there is another working GPU process by another user. Thanks!

Topic		Replies	Views
Flux: GPU not working as expected Machine Learning flux	6	2190	July 28, 2020
Flux on GPU too slow Machine Learning gpu , cuda , flux	5	1118	September 22, 2022
Training with Flux.jl on the GPU causes ArgumentError: cannot take the CPU address of a CuArray GPU question , gpu , flux , machine-learning , neural-network	4	1099	May 28, 2022
Training Flux LSTM on GPU is slower than on CPU Machine Learning question , flux , lstm	1	263	May 16, 2024
How to arrange data for time series forecasting (mini-batching) without violoating the GPU memory for a LSTM? General Usage flux	7	1343	April 20, 2021

Flux: Hard to use train! and DataLoader for minibatched NamedTuple dataset with GPU

Related topics