Getting the most utilization out of a GPU

s-baumann · August 21, 2020, 10:02pm

I am using a relatively high spec GPU (1080-TI) but cannot get it to run at utilizations of more than 5% (according to task manager). I believe that it is being used (rather than the CPU) because the “GPU memory” goes up by about 1GB when I run the below code (also the datatype returned by the neural net is CuArrays.CuArray{Float32,2,Nothing}). I was wondering if anyone has an idea on how change the code/learning algorithm in order to make more use of the GPU? Or generally is very low utilization to be expected (maybe my simple test case is too simple for GPU use)?

using Statistics
using Flux, CUDA
using Random

# Making dummy data
obs = 1000000
x = rand(Float64, 10 , obs)
y = mean(x, dims=1) + sum(x, dims=1)
y[findall(x[4,:] .< 0.3)] .= 17 # Making it slightly harder.

x = x |> gpu
y = y |> gpu

opt = Descent()
# With a CPU
m_cpu = Chain(Dense(10,6),
          Dense(6,5),
          Dense(5,4),
          Dense(4,3),
          Dense(3,2),
          Dense(2,1))
m_gpu = m_cpu |> gpu
m_gpu(x)
using CuArrays
CuArrays.allowscalar(false)


dataset_gpu = Flux.Data.DataLoader(x, y, batchsize=2^12, shuffle=true) |> gpu 
loss_gpu(A, B) = Flux.mae(m_gpu(A),B)
println("Doing GPU training")
loss_gpu(x, y)
for i in 1:100 Flux.train!(loss_gpu, params(m_gpu), dataset_gpu, opt) end
loss_gpu(x, y)

I think this question is related to this one but there did not seem to be a conclusion here

ToucheSir · August 21, 2020, 11:12pm

The 1GB initial allocation is most likely caused by these lines:

x = x |> gpu
y = y |> gpu

(running a DataLoader through gpu is more or less a no-op and not necessary)

WRT performance, I was able to obtain a consistent 60-80% utilization on a comparable GPU (RTX 2070). This is using nvidia-smi, which should be more accurate than task manager. If you’re able to test on a linux machine or WSL, that should help eliminate the OS/environment as a variable. To get a more detailed picture of bottlenecks, you could also look into CUDA.jl’s profiling documentation.

s-baumann · August 22, 2020, 11:02pm

Thanks for this.You were completely right on task manager.

When I started training I ran the following in the command line
C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe
I found I actually had usage at 30%.

Then I could increase the training batchsize to 2^18 to get usage up to the 80% range.

Topic		Replies	Views
Flux: GPU not working as expected Machine Learning flux	6	2176	July 28, 2020
GPU memory usage increasing on each epoch (Flux) Machine Learning cuda , flux	5	637	April 16, 2024
A demo is 1.5x faster in Flux than tensorflow, both use cpu; while 3.0x slower during using CUDA Performance flux	5	1574	August 25, 2021
Flux on GPU too slow Machine Learning gpu , cuda , flux	5	1107	September 22, 2022
Unreliable computations on GPU with Flux General Usage gpu , cuda , flux	1	328	December 15, 2022

Getting the most utilization out of a GPU

Related topics