Memory usage increasing with each epoch

After analysing the heap, I think I’ve finally figured out the problem.

Here’s a screenshot from the heap snapshot:

As we can see, a large number of debug logs are being stored, adding up to 341 MB of memory.

It seems that these logs are being produced by cuDNN and are then being output by PlutoLogger:

These logs are reproduced for every forward pass, which quickly adds up to a substantial amount of memory. I believe that the reason ViT is less affected is because it only contains a single convolutional layer for the patch embedding.

To test this theory, I tried converting the notebook to a standard Julia program that I can execute in the terminal. After running for 10 epochs, I can report that memory usage remains stable throughout the training process as seen in the following logs:

[ Info: 36.74
Effective GPU memory usage: 73.48% (8.566 GiB/11.658 GiB)
Memory pool usage: 1.530 GiB (7.906 GiB reserved)
[ Info: 37.03
Effective GPU memory usage: 73.40% (8.557 GiB/11.658 GiB)
Memory pool usage: 1.530 GiB (7.906 GiB reserved)
[ Info: 37.02
Effective GPU memory usage: 73.46% (8.564 GiB/11.658 GiB)
Memory pool usage: 1.530 GiB (7.906 GiB reserved)
[ Info: 37.02
Effective GPU memory usage: 73.92% (8.617 GiB/11.658 GiB)
Memory pool usage: 1.530 GiB (7.906 GiB reserved)
[ Info: 37.02
Effective GPU memory usage: 73.75% (8.598 GiB/11.658 GiB)
Memory pool usage: 1.530 GiB (7.906 GiB reserved)
[ Info: 36.99
Effective GPU memory usage: 73.74% (8.597 GiB/11.658 GiB)
Memory pool usage: 1.530 GiB (7.906 GiB reserved)
[ Info: 37.0
Effective GPU memory usage: 73.71% (8.593 GiB/11.658 GiB)
Memory pool usage: 1.530 GiB (7.906 GiB reserved)
[ Info: 36.98
Effective GPU memory usage: 73.72% (8.595 GiB/11.658 GiB)
Memory pool usage: 1.530 GiB (7.906 GiB reserved)
[ Info: 36.97
Effective GPU memory usage: 73.60% (8.581 GiB/11.658 GiB)
Memory pool usage: 1.530 GiB (7.906 GiB reserved)
[ Info: 36.96
Effective GPU memory usage: 73.58% (8.578 GiB/11.658 GiB)
Memory pool usage: 1.530 GiB (7.906 GiB reserved)

Now that we know what’s causing the issue, how can I go about resolving it? It looks like the current master branch of cuDNN.jl registers a callback to log debug messages in the __init__ method:

function __init__()
    precompiling = ccall(:jl_generating_output, Cint, ()) != 0

    CUDA.functional() || return

    # find the library
    global libcudnn
    if CUDA.local_toolkit
        dirs = CUDA_Runtime_Discovery.find_toolkit()
        path = CUDA_Runtime_Discovery.get_library(dirs, "cudnn"; optional=true)
        if path === nothing
            precompiling || @error "cuDNN is not available on your system (looked in $(join(dirs, ", ")))"
            return
        end
        libcudnn = path
    else
        if !CUDNN_jll.is_available()
            precompiling || @error "cuDNN is not available for your platform ($(Base.BinaryPlatforms.triplet(CUDNN_jll.host_platform)))"
            return
        end
        libcudnn = CUDNN_jll.libcudnn
    end

    # register a log callback
    if !precompiling && (isdebug(:init, cuDNN) || Base.JLOptions().debug_level >= 2)
        log_cond[] = Base.AsyncCondition() do async_cond
            message = Base.@lock log_lock popfirst!(log_messages)
            _log_message(message...)
        end

        callback = @cfunction(log_message, Nothing,
                              (cudnnSeverity_t, Ptr{Cvoid}, Ptr{cudnnDebug_t}, Ptr{UInt8}))
        cudnnSetCallback(typemax(UInt32), log_cond[], callback)
    end

    _initialized[] = true
end

This seems to be an unusual design choice, and it isn’t obvious to me how one would go about suppressing such messages from within Pluto.

5 Likes