Out of dynamic GPU memory?

peremato · February 17, 2022, 5:51pm

When running a kernel several times (3) I get the following exception (with -g2):

ERROR: Out of dynamic GPU memory (trying to allocate 64 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 64 bytes)
ERROR: Out of dynamic GPU memory (trying to allocate 64 bytes)
ERROR: a exception was thrown during kernel execution.
Stacktrace:
ERROR: a exception was thrown during kernel execution.
Stacktrace:
ERROR: a exception was thrown during kernel execution.
Stacktrace:
 [1] gc_pool_alloc at /home/sftnight/.julia/packages/GPUCompiler/1Ajz2/src/runtime.jl:129
 [1] gc_pool_alloc at /home/sftnight/.julia/packages/GPUCompiler/1Ajz2/src/runtime.jl:129
 [1] gc_pool_alloc at /home/sftnight/.julia/packages/GPUCompiler/1Ajz2/src/runtime.jl:129
...

I tried to GC.gc(true); CUDA.reclaim() between executions but it does not help. I get the crash despite that the memory usage is reduced.

julia> CUDA.memory_status()
Effective GPU memory usage: 1.41% (209.938 MiB/14.561 GiB)
Memory pool usage: 80.176 KiB (32.000 MiB reserved)
julia> GC.gc(true); CUDA.reclaim()
julia> CUDA.memory_status()
Effective GPU memory usage: 1.19% (177.938 MiB/14.561 GiB)
Memory pool usage: 0 bytes (0 bytes reserved)

For information, I am using a CuArray of a Union of 3 structs which is the maximum number of types I can use for the time being (see Limitation in Union types with CUDA.jl?)
I there are way to get a real traceback to point to the problem?

maleadt · February 17, 2022, 6:40pm

Dynamic memory is memory allocated from within a kernel, and because of how CUDA works that memory is lost after the kernel exist. Basically, don’t allocate within a kernel. The support for that only exists to support some limited cases where we need to allocate an exception object before throwing it.

To find out where the allocations come from, inspect the LLVM code (@device_code_llvm) and look for calls to alloc-like functions. Escape analysis in 1.8/1.9 is going to improve this, but for now you might have to force-inline some functions or avoid passing complex object to complex functions.

peremato · February 18, 2022, 8:14am

Thank-you very much. I guess some allocations (i.e. temporary objects) have sneaked in the code that later runs in the kernel. I’ll follow your suggestion.

pxl-th · February 21, 2022, 12:35am

In my case it was due to the use of StaticArrays.MVector in kernels. Inlining functions used by the kernel helped eliminate the allocations.

maleadt · February 21, 2022, 7:46am

A similar case was noted here, https://github.com/JuliaLang/julia/issues/41800, which should be fixed in the upcoming 1.8.

peremato · February 21, 2022, 12:12pm

Indeed I am using StaticArrays.MVector ink the kernel. I didn’t know any other way to have a mutable fix length vector.

kichappa · July 16, 2025, 3:52am

I am finding myself rummaging through this error even now as I use SVector/SMatrices. Was this fixed?

maleadt · July 16, 2025, 5:47pm

Check the LLVM IR (@device_code_llvm dump_module=true ...) and look for gpu_malloc calls. They can happen when an MArray allocation wasn’t properly optimized away by Julia, resulting in allocations. It happens because mutable StaticArrays are kinda problematic, in that they rely on a Julia optimization kicking in, which doesn’t always happen (as observed here). If you want to avoid running into this, use SArray with Base.setindex (i.e. the non-mutating version that returns a new object), which is less likely to run into this.

Benny · July 16, 2025, 11:39pm

SArrays also allocate if the element type is abstract, even for small isbits Unions because the backend Tuple is exceptionally covariant in its parameters and thus cannot do Memory’s inline element optimization. Worth looking out for mistakes in manually specified parameters in constructors, like SVector{2, Integer}, or inputs with abstract element types like SVector{1}(push!([], 1)).

Topic		Replies	Views
Using MVector in CUDA without memory errors GPU	3	431	October 17, 2023
CUDAnative dynamic allocation GPU question , cudanative	5	1810	March 4, 2020
Significant CUDA.jl memory allocations outside of main pool? GPU memory	2	1410	August 6, 2022
Freeing memory in the GPU with CUDAdrv / CUDAnative / CuArrays GPU	8	3048	November 13, 2018
Why is it consuming and not freeing GPU memory? GPU	5	464	April 18, 2024

Out of dynamic GPU memory?

Related topics