Hi, I am training a number of NNs with Flux and Cuda. However, the training procedure is not accelerated as expected on GPU. I used the CuIterator as described in the docs for GPU training. Any ideas on how can I accelerate the training procedure? My code is mainly the following.
using Flux
using CUDA
CUDA.allowscalar(false)
X = rand(6, 20000)
Y = rand(1, 20000)
data = Flux.DataLoader((X,Y), batchsize = 64)
modelgpu = Chain(Dense(6,256, elu), Dense(256,256, elu), Dense(256,256, elu), Dense(256,1)) |>gpu
modelcpu = cpu(modelgpu)
function training(model, train::Flux.DataLoader;
epochs::Int = 100, opt = Flux.Adam(1e-3), loss_fun = Flux.Losses.mse,
GPU::Bool)
par_model = Flux.params(model)
for ep=1:epochs
if GPU
for (x, y) in CuIterator(train)
∇ = gradient(par_model) do
loss_fun(model(x),y)
end
Flux.Optimise.update!(opt, par_model, ∇)
end
else
for (x, y) in train
∇ = gradient(par_model) do
loss_fun(model(x),y)
end
Flux.Optimise.update!(opt, par_model, ∇)
end
end
end
end
@time training(modelcpu, data;
epochs = 500, opt = Flux.Adam(1e-3), loss_fun = Flux.Losses.mse,
GPU=false)
@time training(modelgpu, data;
epochs = 500, opt = Flux.Adam(1e-3), loss_fun = Flux.Losses.mse,
GPU=false)
The result is
GPU:
397.086585 seconds (542.80 M allocations: 35.734 GiB, 3.18% gc time, 3.75% compilation time)
CPU
496.777942 seconds (22.23 M allocations: 834.203 GiB, 4.65% gc time)
Version Info
Julia Version 1.8.1
Commit afb6c60d69a (2022-09-06 15:09 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
Threads: 1 on 8 virtual cores
CUDA info
CUDA toolkit 11.7, artifact installation
NVIDIA driver 515.65.1, for CUDA 11.7
CUDA driver 11.7
Libraries:
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 11.0.0+515.65.1
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)
Toolchain:
- Julia: 1.8.1
- LLVM: 13.0.1
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86
1 device:
0: NVIDIA GeForce GTX 1050 (sm_61, 3.863 GiB / 4.000 GiB available)