Flux runs out of memory

jeremiedb · January 14, 2023, 6:27am

Also just to confirm, had you added the following block:

using ChainRulesCore
import ChainRulesCore: rrule

function ChainRulesCore.rrule(cfg::RuleConfig, c::Chain, x::AbstractArray)
    duo = accumulate(c.layers; init=(x, nothing)) do (input, _), layer
        out, back = rrule_via_ad(cfg, layer, input)
    end
    outs = map(first, duo)
    backs = map(last, duo)
    function un_chain(dout)
        multi = accumulate(reverse(backs); init=(nothing, dout)) do (_, delta), back
            dlayer, din = back(delta)
        end
        layergrads =
            foreach(CUDA.unsafe_free!, outs)
        foreach(CUDA.unsafe_free!, map(last, multi[1:end-1]))
        return (Tangent{Chain}(; layers=reverse(map(first, multi))), last(multi[end]))
    end
    outs[end], un_chain
end
# Could restrict this to x::CuArray... for testing instead write NaN into non-CuArrays, piratically:
CUDA.unsafe_free!(x::Array) = fill!(x, NaN)
CUDA.unsafe_free!(x::Flux.Zygote.Fill) = nothing

This has been the major helper that @mcabbott provided which roughly doubled the batch size capacity without having to disable CUDA pool which had significant adverse effect on performance.

DrChainsaw · January 14, 2023, 9:07am

I sometimes wonder if one could take this even further and provide a callback function which gets the gradient and the thing the gradient is for instead of returning them.

One could then immediately apply any optimizer update and then free parameter gradients. Probably alot of things which does not work out with it (non-linear optimizers and parameters reoccuring?) though so it might be difficult to make generic.

LucasMSpereira · January 14, 2023, 12:02pm

Now I tried including this block. Same result.

itan1 · May 31, 2023, 7:38am

Hey everyone, I am running into the same problem.

I am training a residual U-Net for 3D image segmentation with FastAI.jl on GCloud with a 16GB T4 GPU but keep getting out of memory problems on the GPU. After searching online I made sure to set JULIA_CUDA_MEMORY_POOL to “none” and added a callback after every epoch that runs GC.gc(true) and CUDA.reclaim() . I think the problem is a memory leak, as it only occurs after ~30 epochs. I can also see the GPU utility dropping before the crash (see dashboard screenshot below). When I decrease the input image size it happens later. When I decrease the model size it happens earlier (this I really do not understand?). I posted the question here on StackOverflow. Does anyone have an idea what could be the problem or what I can try to figure it out?

Some context info:

I am using Julia 1.9.0 with FastAI.jl 0.5.1, Flux.jl 0.13.16, CUDA.jl 4.2.0
The VM is Ubuntu 22.04 x86_64 with CUDA toolkit 12.1, NVIDIA driver 530.30.02, and an NVIDIA Tesla T4 GPU with 16GB RAM
The model is a residual U-Net with approximately 9.5 million parameters, the input data are 3D Float32 images with size (96, 96, 96) and I am using a batch size of 2.

Some things I’ve tried:

I can reproduce the behaviour reliably, it happens every time after the same amount of epochs
If I decrease the input image size, it still happens but later (epoch 60)
If I decrease the model size it happens earlier (this I especially don’t understand)
I’ve set JULIA_CUDA_MEMORY_POOL to none and added a callback after each epoch that executes GC.gc(true) and CUDA.reclaim()
I’ve upgraded the GPU to an NVIDIA L4 with 24GB RAM. With this GPU I can train until epoch 125 but then the same thing happens (gradual decrease of GPU utility and finally an OOM error). I can work with this for now, simply restarting the training after every crash, but it is not ideal

This is a screenshot of the dashboard with VM and GPU metrics, you can see the GPU utility dropping in several steps just before the crash:

DrChainsaw · May 31, 2023, 3:22pm

Here is one possible reason which matches your symptoms quite well: CUDA memory leak for Flux.Optimizer · Issue #148 · FluxML/FluxTraining.jl · GitHub

itan1 · June 1, 2023, 6:35am

Thank you very much, that was it! I replaced my Flux.Nesterov optimiser with Optimisers.Nesterov from Optimisers.jl. This new training run is already training longer than I managed before and I do not see those symptoms above anymore.

Topic		Replies	Views
GPU memory usage increasing on each epoch (Flux) Machine Learning cuda , flux	5	658	April 16, 2024
Flux: ERROR: OutOfMemoryError() New to Julia flux	1	416	October 4, 2019
Out of memory using Flux CNN during back propagation phase Machine Learning	2	625	June 28, 2019
Flux + GPU memory problems Machine Learning flux	2	812	April 26, 2022
Memory challenges for Flux on Resnet Machine Learning gpu	8	1370	September 7, 2022

Flux runs out of memory

Related topics