Using CUDA fft


Hi folks,
just starting to use CuArrays, there is something I do not understand and that probably somebody can help me understand.
I just try to test fft using CUDA and I run into ‘out of memory’ issues, but only the second time I try to do the fft.
Just did

using CuArrays
using FFTW

N = 2^9
a = rand(Float32,N,N,N)

a_d = cu(a)

@time fft(a_d);

> 6.613403 seconds (22.63 M allocations: 1.113 GiB, 6.04% gc time)

@time fft(a_d)

ERROR: CUFFTError(code 2, cuFFT failed to allocate GPU or CPU memory)
 [1] macro expansion at /home/mazzanti/.julia/packages/CuArrays/PD3UJ/src/fft/error.jl:59 [inlined]
 [2] _mkplan(::UInt8, ::Tuple{Int64,Int64,Int64}, ::UnitRange{Int64}) at /home/mazzanti/.julia/packages/CuArrays/PD3UJ/src/fft/wrappers.jl:19
 [3] plan_fft at /home/mazzanti/.julia/packages/CuArrays/PD3UJ/src/fft/highlevel.jl:31 [inlined]
 [4] fft at /home/mazzanti/.julia/packages/AbstractFFTs/7WCaR/src/definitions.jl:51 [inlined]
 [5] fft(::CuArray{Float32,3}, ::UnitRange{Int64}) at /home/mazzanti/.julia/packages/CuArrays/PD3UJ/src/fft/genericfft.jl:27 (repeats 2 times)
 [6] top-level scope at util.jl:156

…and I fail to understand why the first time the fft is done, while the second round it refuses to evaluate it.
Maybe there is some way to clean the internal memory of the GPU?

Thanks for your help,



You’re allocating 500mb right there… With a normal GPU, that means you run out of memory fairly quickly, especially since FFT might allocate intermediate buffers of the same size. You should try the inplace fft version and make sure, that you free the GPU memory (e.g. a_d = nothing) between runs.

1 Like