Why GPU still OOM when using CUDA unified memory?

I’ve way more RAM than VRAM. Code runs fine on CPU. I thought using CUDA unified memory lets GPU tap system RAM but it still OOM.

I’m asking bc I’m getting a new laptop for ML work. Does unified memory mean VRAM is no longer a hard ceiling and as important of a spec? (I know having GPU tap system RAM is way slower but at least it runs)

1 Like

What you’re describing sounds more like an integrated GPU sharing memory with the CPU. Apple has a Unified Memory Architecture that does this in small part. CUDA’s Unified Memory is an unrelated software abstraction that lets the GPU and CPU code access data with the same pointer, even when the data is actually being migrated between the CPU’s and GPU’s memory.


Depends on your OS. Windows might have this Can PyTorch GPU Use Shared GPU Memory (from RAM, shows in Windows Task Manage)? - Stack Overflow



Works here:

julia> Sys.free_memory() |> Base.format_bytes
"55.165 GiB"

julia> CUDA.available_memory() |> Base.format_bytes
"46.865 GiB"

julia> a = CuVector{UInt8,Mem.Unified}(undef, 50*2^30);

julia> sizeof(a) |> Base.format_bytes
"50.000 GiB"

julia> a .= 1;

julia> Sys.free_memory() |> Base.format_bytes
"7.322 GiB"

julia> CUDA.available_memory() |> Base.format_bytes
"2.000 MiB"

What platform are you on?


Cool didn’t know it can split a single variable. I’m on Windows cuda 12 but a really old gpu so prob not bother debugging. for otherfolks: you can let cuda.jl alloc unified buffer by default by adding LocalPreferences.toml to your env folder w/ lines of
default_memory =“unified”

1 Like

From CUDA C++ Programming Guide

Devices of compute capability lower than 6.0 cannot allocate more managed memory than the physical size of GPU memory.