Unexpectedly high memory usage when running CUFFT.ifft()

I want to use CUDA.jl instead of CUDA C/C++ on Jetson nano (Single-board computer with GPU), but I am puzzled by the inexplicable memory usage when executing CUFFT.ifft(). I have confirmed that the memory usage of the Julia process increases by about 800 MB only when CUFFT.ifft() is executed on multiple environments, including Jetson, Ubuntu, and Windows. What is happening? The memory increase after the CUFFT.fft() run is about 180 MB.

I also have tried handling plans, but nothing has changed. It seems to me that the fact that the plan generated by CUFFT.plan_ifft() is not CUFFT.cCuFFTPlan{}, but AbstractFFT.ScaledPlan{} may have something to do with the problem, but I am not sure. The problem occurs similarly when the plan is generated, the input array is multiplied and the FFT is executed.

Is there a way to somehow execute fft() and ifft() with less memory usage?


I will add specific code that can reproduce the problem.


x = ComplexF32.(CUDA.rand(1024,1024))
# 1024×1024 CuArray{ComplexF32, 2, CUDA.Mem.DeviceBuffer}

# this is OK (Memory usage of the Julia process will increase by about 180 MB.)
fft_x = CUFFT.fft(x)
# for fist time
# 0.965564 seconds (4.46 M allocations: 229.851 MiB, 8.06% gc time, 95.79% compilation time)
# the second time
# 0.005171 seconds (16 allocations: 672 bytes)

# Memory usage of Julia process up to this point is 496 MB.

# this is the problem 
ifft_x = CUFFT.ifft(x)
# for fist time
# 1.399079 seconds (1.26 M allocations: 67.445 MiB, 2.01% gc time, 63.29% compilation time)
# the second time
# 0.005467 seconds (17 allocations: 704 bytes)

# Then, memory usage increased to 1.2 GB.

Why is it a problem? Julia uses a GC, and GPU allocations are pooled too, so unless you’re actively running out of memory this may not be an issue.

I’m not familiar with how ifft is implemented in AbstractFFTs so can’t comment on how it maps onto CUFFT’s functionality.

1 Like

Thank you for the reply.
When this code (and some other operations) is executed on Jetson nano (RAM: 4GB), Jetson nano freezes, and the system is forced to reboot. This is due to a lack of memory, and it seems puzzling that so much memory is consumed just for ifft(), when fft() and ifft() are essentially the same operations.

That seems like a problem indeed :slight_smile: Try tracing through the invocation of plan_ifft! to see how it uses CUFFT. Alternative, file an issue on CUDA.jl, but it may take a while before anybody gets to debugging this.