Running the Flux’s model-zoo CIFAR10 example seems to saturate the 8GB memory of my Quadro RTX 4000:
[ Info: CUDA is on
[ Info: Constructing Model
[ Info: Training....
[ Info: Epoch 1
┌ Warning: `Target(triple::String)` is deprecated, use `Target(; triple = triple)` instead.
│ caller = ip:0x0
└ @ Core :-1
ERROR: LoadError: CuArrays.OutOfGPUMemoryError(1048576000)
Stacktrace:
[1] alloc at /home/natale/.julia/packages/CuArrays/YFdj7/src/memory.jl:202 [inlined]
[2] CuArrays.CuArray{Float32,4,P} where P(::UndefInitializer, ::NTuple{4,Int64}) at /home/natale/.julia/packages/CuArrays/YFdj7/src/array.jl:107
[3] CuArray at /home/natale/.julia/packages/CuArrays/YFdj7/src/array.jl:115 [inlined]
[4] similar at ./abstractarray.jl:671 [inlined]
[5] similar at ./abstractarray.jl:670 [inlined]
[6] similar at /home/natale/.julia/packages/CuArrays/YFdj7/src/broadcast.jl:11 [inlined]
[7] copy(::Base.Broadcast.Broadcasted{CuArrays.CuArrayStyle{4},NTuple{4,Base.OneTo{Int64}},typeof(identity),Tuple{CuArrays.CuArray{Float32,4,Nothing}}}) at ./broadcast.jl:840
[8] materialize at ./broadcast.jl:820 [inlined]
[9] BatchNorm at /home/natale/.julia/packages/Flux/Fj3bt/src/cuda/cudnn.jl:4 [inlined] (repeats 2 times)
[10] applychain(::Tuple{BatchNorm{typeof(identity),CuArrays.CuArray{Float32,1,Nothing},CuArrays
...
One thing which I find a bit strange is that, if I’m monitoring the gpu with watch -n0.1 nvidia-smi
while running the example, the above error is raised shortly after the gpu usage gets to around 3GB (out of 8), when the process’ memory suddenly jumps above 7GB.
I’m quite a newbie on these matters. My first guess is that one should avoid to load the entire training set to the gpu (line 37), and it would make sense to move the data to the gpu one minibatch at a time. However, it is not clear to me how to do that when calling Flux.train!
at line 166.