Flux's model-zoo CIFAR10 example saturates 8GB gpu

Running the Flux’s model-zoo CIFAR10 example seems to saturate the 8GB memory of my Quadro RTX 4000:

[ Info: CUDA is on
[ Info: Constructing Model
[ Info: Training....
[ Info: Epoch 1
┌ Warning: `Target(triple::String)` is deprecated, use `Target(; triple = triple)` instead.
│   caller = ip:0x0
└ @ Core :-1
ERROR: LoadError: CuArrays.OutOfGPUMemoryError(1048576000)
 [1] alloc at /home/natale/.julia/packages/CuArrays/YFdj7/src/memory.jl:202 [inlined]
 [2] CuArrays.CuArray{Float32,4,P} where P(::UndefInitializer, ::NTuple{4,Int64}) at /home/natale/.julia/packages/CuArrays/YFdj7/src/array.jl:107
 [3] CuArray at /home/natale/.julia/packages/CuArrays/YFdj7/src/array.jl:115 [inlined]
 [4] similar at ./abstractarray.jl:671 [inlined]
 [5] similar at ./abstractarray.jl:670 [inlined]
 [6] similar at /home/natale/.julia/packages/CuArrays/YFdj7/src/broadcast.jl:11 [inlined]
 [7] copy(::Base.Broadcast.Broadcasted{CuArrays.CuArrayStyle{4},NTuple{4,Base.OneTo{Int64}},typeof(identity),Tuple{CuArrays.CuArray{Float32,4,Nothing}}}) at ./broadcast.jl:840
 [8] materialize at ./broadcast.jl:820 [inlined]
 [9] BatchNorm at /home/natale/.julia/packages/Flux/Fj3bt/src/cuda/cudnn.jl:4 [inlined] (repeats 2 times)
 [10] applychain(::Tuple{BatchNorm{typeof(identity),CuArrays.CuArray{Float32,1,Nothing},CuArrays

One thing which I find a bit strange is that, if I’m monitoring the gpu with watch -n0.1 nvidia-smi while running the example, the above error is raised shortly after the gpu usage gets to around 3GB (out of 8), when the process’ memory suddenly jumps above 7GB.

I’m quite a newbie on these matters. My first guess is that one should avoid to load the entire training set to the gpu (line 37), and it would make sense to move the data to the gpu one minibatch at a time. However, it is not clear to me how to do that when calling Flux.train! at line 166.

I perfomed the same test with a GeForce GTX 1080 Ti (11GB). Same thing: shortly after reaching 3GB of memory usage, all memory got filled up.

Yes, I think that these example need to be updated, to allow .
I think a proposal is being organized in https://julialang.zulipchat.com/login/, meantime you can update the training using https://github.com/JuliaML/MLDataPattern.jl or another option. Flux.train! is able to train using Iterator, so it is not an problem of Flux, it is mainly a problem of the examples in model-zoo.

Actually Flux is awesome but some work is needed, specially in the zoo. I hope in a new future to be less busy and being able to contribute to it.

Based on the order of execution, I would assume the first 3GB or so is the fixed overhead of loading the data and model, while the rest is incurring during forward + backward passes. Have you tried dramatically reducing the batch size? I’m not sure why it’s set so high, as most folks do not have 8GB+ of VRAM in their workstations…

The ideal approach would be to use something like CUDA.jl’s CuIterator. Not sure if that works with Flux.train! directly though.

Yes I tried with small batches (16, I think), and I was getting the same issue. I just retried with batches of size 1, and it crashes as well.