Flux's model-zoo CIFAR10 example saturates 8GB gpu

natema · June 29, 2020, 12:36pm

Running the Flux’s model-zoo CIFAR10 example seems to saturate the 8GB memory of my Quadro RTX 4000:

[ Info: CUDA is on
[ Info: Constructing Model
[ Info: Training....
[ Info: Epoch 1
┌ Warning: `Target(triple::String)` is deprecated, use `Target(; triple = triple)` instead.
│   caller = ip:0x0
└ @ Core :-1
ERROR: LoadError: CuArrays.OutOfGPUMemoryError(1048576000)
Stacktrace:
 [1] alloc at /home/natale/.julia/packages/CuArrays/YFdj7/src/memory.jl:202 [inlined]
 [2] CuArrays.CuArray{Float32,4,P} where P(::UndefInitializer, ::NTuple{4,Int64}) at /home/natale/.julia/packages/CuArrays/YFdj7/src/array.jl:107
 [3] CuArray at /home/natale/.julia/packages/CuArrays/YFdj7/src/array.jl:115 [inlined]
 [4] similar at ./abstractarray.jl:671 [inlined]
 [5] similar at ./abstractarray.jl:670 [inlined]
 [6] similar at /home/natale/.julia/packages/CuArrays/YFdj7/src/broadcast.jl:11 [inlined]
 [7] copy(::Base.Broadcast.Broadcasted{CuArrays.CuArrayStyle{4},NTuple{4,Base.OneTo{Int64}},typeof(identity),Tuple{CuArrays.CuArray{Float32,4,Nothing}}}) at ./broadcast.jl:840
 [8] materialize at ./broadcast.jl:820 [inlined]
 [9] BatchNorm at /home/natale/.julia/packages/Flux/Fj3bt/src/cuda/cudnn.jl:4 [inlined] (repeats 2 times)
 [10] applychain(::Tuple{BatchNorm{typeof(identity),CuArrays.CuArray{Float32,1,Nothing},CuArrays
...

One thing which I find a bit strange is that, if I’m monitoring the gpu with watch -n0.1 nvidia-smi while running the example, the above error is raised shortly after the gpu usage gets to around 3GB (out of 8), when the process’ memory suddenly jumps above 7GB.

I’m quite a newbie on these matters. My first guess is that one should avoid to load the entire training set to the gpu (line 37), and it would make sense to move the data to the gpu one minibatch at a time. However, it is not clear to me how to do that when calling Flux.train! at line 166.

natema · June 29, 2020, 3:39pm

I perfomed the same test with a GeForce GTX 1080 Ti (11GB). Same thing: shortly after reaching 3GB of memory usage, all memory got filled up.

dmolina · June 29, 2020, 3:53pm

Yes, I think that these example need to be updated, to allow .
I think a proposal is being organized in Julia Community, meantime you can update the training using https://github.com/JuliaML/MLDataPattern.jl or another option. Flux.train! is able to train using Iterator, so it is not an problem of Flux, it is mainly a problem of the examples in model-zoo.

Actually Flux is awesome but some work is needed, specially in the zoo. I hope in a new future to be less busy and being able to contribute to it.

ToucheSir · June 29, 2020, 4:19pm

Based on the order of execution, I would assume the first 3GB or so is the fixed overhead of loading the data and model, while the rest is incurring during forward + backward passes. Have you tried dramatically reducing the batch size? I’m not sure why it’s set so high, as most folks do not have 8GB+ of VRAM in their workstations…

ToucheSir · June 29, 2020, 4:35pm

The ideal approach would be to use something like CUDA.jl’s CuIterator. Not sure if that works with Flux.train! directly though.

natema · June 29, 2020, 4:35pm

Yes I tried with small batches (16, I think), and I was getting the same issue. I just retried with batches of size 1, and it crashes as well.

Topic		Replies	Views
`CUDA error: out of memory` with Flux Machine Learning flux	4	1657	August 24, 2020
Flux: GPU not working as expected Machine Learning flux	6	2205	July 28, 2020
Out of memory using Flux CNN during back propagation phase Machine Learning	2	637	June 28, 2019
Flux - Batch data loop in callback causing GPU Memory Error Machine Learning flux	5	1789	August 1, 2020
Flux + GPU memory problems Machine Learning flux	2	830	April 26, 2022

Flux's model-zoo CIFAR10 example saturates 8GB gpu

Related topics