Hm, I see.
The following still happens to me, and I’m not sure how to prevent that.
After some time it looks like memory is leaked and I can’t calculate the FFT without a manual GC.gc(true)
call in between:
Can there be a problem with memory leakage of CuFFT within Julia?
julia> using CUDA, FFTW
julia> x = CUDA.rand(ComplexF32, (512, 512, 512)); # 1GiB memory
julia> CUDA.memory_status()
Effective GPU memory usage: 24.61% (1.913 GiB/7.773 GiB)
CUDA allocator usage: 1.000 GiB
Memory pool usage: 1.000 GiB (1.000 GiB allocated, 0 bytes cached)
julia> CUDA.@time y = fft(x);
0.995551 seconds (2.68 M CPU allocations: 147.661 MiB, 2.32% gc time) (2 GPU allocations: 2.000 GiB, 4.73% gc time of which 9.55% spent allocating)
julia> CUDA.memory_status()
Effective GPU memory usage: 50.96% (3.962 GiB/7.773 GiB)
CUDA allocator usage: 2.000 GiB
Memory pool usage: 2.000 GiB (2.000 GiB allocated, 0 bytes cached)
julia> GC.gc(true)
julia> CUDA.memory_status()
Effective GPU memory usage: 63.83% (4.962 GiB/7.773 GiB)
CUDA allocator usage: 2.000 GiB
Memory pool usage: 2.000 GiB (2.000 GiB allocated, 0 bytes cached)
julia> CUDA.@time y = fft(x);
0.027007 seconds (329.92 k CPU allocations: 5.035 MiB) (2 GPU allocations: 2.000 GiB, 0.05% gc time of which 61.26% spent allocating)
julia> CUDA.@time y = fft(x);
0.083744 seconds (251.18 k CPU allocations: 3.833 MiB, 4.40% gc time) (2 GPU allocations: 2.000 GiB, 68.15% gc time of which 85.60% spent allocating)
julia> CUDA.@time y = fft(x);
0.037591 seconds (373.56 k CPU allocations: 5.701 MiB, 7.31% gc time) (2 GPU allocations: 2.000 GiB, 18.91% gc time of which 0.13% spent allocating)
julia> CUDA.@time y = fft(x);
0.037605 seconds (371.99 k CPU allocations: 5.677 MiB, 7.70% gc time) (2 GPU allocations: 2.000 GiB, 19.19% gc time of which 0.14% spent allocating)
julia> CUDA.@time y = fft(x);
0.038422 seconds (365.12 k CPU allocations: 5.573 MiB, 8.85% gc time) (2 GPU allocations: 2.000 GiB, 20.55% gc time of which 0.16% spent allocating)
julia> CUDA.@time y = fft(x);
0.038034 seconds (362.32 k CPU allocations: 5.529 MiB, 8.32% gc time) (2 GPU allocations: 2.000 GiB, 20.03% gc time of which 0.15% spent allocating)
julia> GC.gc(true)
julia> CUDA.memory_status()
Effective GPU memory usage: 76.69% (5.962 GiB/7.773 GiB)
CUDA allocator usage: 2.000 GiB
Memory pool usage: 2.000 GiB (2.000 GiB allocated, 0 bytes cached)
julia> CUDA.@time y = fft(x);
0.027007 seconds (322.19 k CPU allocations: 4.917 MiB) (2 GPU allocations: 2.000 GiB, 0.05% gc time of which 66.57% spent allocating)
julia> CUDA.@time y = fft(x);
ERROR: CUFFTError: driver or internal cuFFT library error (code 5, CUFFT_INTERNAL_ERROR)
Stacktrace:
[1] throw_api_error(res::CUDA.CUFFT.cufftResult_t)
@ CUDA.CUFFT ~/.julia/packages/CUDA/k52QH/lib/cufft/error.jl:64
[2] macro expansion
@ ~/.julia/packages/CUDA/k52QH/lib/cufft/error.jl:81 [inlined]
[3] cufftMakePlan3d(plan::Int32, nx::Int64, ny::Int64, nz::Int64, type::CUDA.CUFFT.cufftType_t, workSize::Base.RefValue{UInt64})
@ CUDA.CUFFT ~/.julia/packages/CUDA/k52QH/lib/utils/call.jl:26
[4] create_plan(xtype::CUDA.CUFFT.cufftType_t, xdims::Tuple{Int64, Int64, Int64}, region::UnitRange{Int64})
@ CUDA.CUFFT ~/.julia/packages/CUDA/k52QH/lib/cufft/fft.jl:137
[5] plan_fft
@ ~/.julia/packages/CUDA/k52QH/lib/cufft/fft.jl:293 [inlined]
[6] #plan_fft#10
@ ~/.julia/packages/FFTW/Iu2GG/src/fft.jl:693 [inlined]
[7] plan_fft
@ ~/.julia/packages/FFTW/Iu2GG/src/fft.jl:693 [inlined]
[8] fft(x::CuArray{ComplexF32, 3})
@ AbstractFFTs ~/.julia/packages/AbstractFFTs/JebmH/src/definitions.jl:50
[9] macro expansion
@ ~/.julia/packages/CUDA/k52QH/src/utilities.jl:28 [inlined]
[10] top-level scope
@ ~/.julia/packages/CUDA/k52QH/src/pool.jl:572 [inlined]
[11] top-level scope
@ ./REPL[18]:0
[12] top-level scope
@ ~/.julia/packages/CUDA/k52QH/src/initialization.jl:81
I made also the observation that the memory allocation is twice as high as in FFTW. What is the reason for that?
julia> using FFTW, CUDA
julia> x = randn(ComplexF32, (1024, 1024));
julia> x_c = CuArray(x);
julia> @time fft(x);
0.297959 seconds (824.23 k allocations: 57.072 MiB, 23.40% gc time)
julia> @time fft(x);
0.058378 seconds (29.23 k allocations: 9.767 MiB, 12.41% gc time, 22.68% compilation time)
julia> @time fft(x);
0.041144 seconds (35 allocations: 8.003 MiB)
julia> CUDA.@time fft(x_c);
1.145209 seconds (4.24 M CPU allocations: 235.730 MiB, 5.92% gc time) (2 GPU allocations: 16.000 MiB, 5.87% gc time of which 0.04% spent allocating)
julia> CUDA.@time fft(x_c);
0.000613 seconds (2.72 k CPU allocations: 43.109 KiB) (2 GPU allocations: 16.000 MiB, 2.30% gc time of which 61.99% spent allocating)
julia> CUDA.@time fft(x_c);
0.000698 seconds (2.75 k CPU allocations: 43.609 KiB) (2 GPU allocations: 16.000 MiB, 13.06% gc time of which 95.62% spent allocating)