`CUDA error: out of memory` with Flux

natema · August 21, 2020, 6:13pm

Well… the code is now running with batches of size 10 rather than 1000.
I vaguely remember to have tried even with a batch size of 1 and to be getting the same error, but I didn’t save the example so I will remain wondering…
Btw, I was checking the GPU memory usage with nvidia-smi and, independently of the batch size, the code leaves only few MB of ram available when running the main loop, which I copy here for easy reference:

for epoch = 1:epochs
        @info "epoch" epoch
        for i in 1:batchnum
                batch = trainset[i] |> gpu
                gs = gradient(params(m)) do
                        l = loss(batch...)
                end
                @info "batch fraction" i/batchnum
                update!(opt, params(m), gs)
        end
        @show accuracy(valX, valY)
end

The error is being raised then trying to run accuracy.
I’m now doing a binary search to find out the critical batch size, for what is worth.

Topic		Replies	Views
Strange CUDA_ERROR_OUT_OF_MEMORY when using CUDA on HPC GPU	5	514	November 23, 2020
Flux runs out of memory Machine Learning memory-allocation , flux	25	4303	June 1, 2023
Hard error using dice loss General Usage flux , dice-loss	2	280	February 20, 2024
Flux: ERROR: OutOfMemoryError() New to Julia flux	1	416	October 4, 2019
CUDA(.jl) memory errors for very large kernels GPU cuda , error , code-generation	24	522	April 22, 2025

`CUDA error: out of memory` with Flux

Related topics