Out of memory using Flux CNN during back propagation phase

I want to train a CNN model on the German Traffic Sign Dataset by adapting one of the model_zoo examples. First I tried training on a GTX1070 GPU with 8GB of RAM and failed with out of memory. After that I tried training on CPU (with 16GB of RAM) and also failed with out of memory.I tried using batchsizes of 64 and 16 with the same problem.
Is this model too complex for my machine?
The model is:

 model = Chain(
     # First convolution, operating upon a 32x32 image
     Conv((3, 3), 1=>32, pad=(1,1), relu),
	 Conv((3, 3), 32=>32, pad=(1,1), relu),
     x -> maxpool(x, (2,2)),

     # Second convolution, operating upon a 16x16 image
     Conv((3, 3), 32=>64, pad=(1,1), relu),
	 Conv((3, 3), 64=>64, pad=(1,1), relu),
     x -> maxpool(x, (2,2)),

     # Third convolution, operating upon a 8x8 image
     Conv((3, 3), 64=>128, pad=(1,1), relu),
	 Conv((3, 3), 128=>128, pad=(1,1), relu),
     x -> maxpool(x, (2,2)),

     x -> reshape(x, :, size(x, 4)),
	 Dense(2048, 128),
	 Dense(128, 128),
     Dense(128, 43),
     # Finally, softmax to get nice probabilities

I’m also having OutOfMemoryError() problems with a large batch size but I think your model isn’t that complex to be continuously causing that problem. Do you have any updates? If you’re using CuArrays I’ve read that you could force freeing the GPU memory in between the batches/epochs with finalize, which might help.

I have not tested this workaround yet because meanwhile I switched to Knet. I might give it another try in the future, but for the moment Knet does the job really well.