Usage of CUDA.CUFFT.cufftPlanMany

gerhardu · August 26, 2022, 8:26pm

Hi,
I need to calculate approx 600 FFT’s of 3 dimensional arrays (e.g. 128^3).

I know how to do this on CPUs and also how to do this sequentially on a GPU.
By sequentially I mean that I copy one of the 600 arrays to the GPU, calculate the FFT and send it back to the host. Since the arrays are quite small, i guess i could gain a lot by using a batched FFT calculation.
As far as I understand CUDA.CUFFT.cufftPlanMany does exactly this. But i could not figure out how to use it. Has anyone a working example in Julia?

Maybe some more info what I am doing, in case someone has a better way of solving this:
A, B, C are Arrays of 3-dimensional Arrays

Pseudo code:
for i in 1:600
tmpA = ifft(A[i])
tmpB = ifft(B[i])
C[i] = fft(tmpA.* (tmpB.^2 + tmpA.^2)

Thanks

gerhardu · August 30, 2022, 6:32pm

I found the solution:
There is no need to invoke CUDA.CUFFT.cufftPlanMany. The functionality of batched fft’s is contained in julias AbstractFFT structure.

Eg if N ffts of size 128^3 need to be calculated, then one simply copies the data of the 128^3 arrays in an 3+1 dimensional array (extension in each dimension 128,128,128, N): the first one to newarray(:,:,:,1), the second one to newarray(:,:,:,2) and so forth up to newarray(:,:,:,N).
Having assembled newarray, the next step is to simply performing the fft along the first 3 dimensions:
fft(newarray,[1,2,3]). This automatically computes all the N fft’s.
One then extracts the individual N 128^3 arrays from the returned 3+1 dimensional array like outlined above. Works as well on the GPU!

Topic		Replies	Views
Using CUDA fft General Usage	1	1148	March 13, 2019
Code snippet for multiGPU fft GPU	8	1393	March 3, 2025
CUDA.jl crashes if a 4d FFT is asked GPU fft	2	540	April 7, 2023
Unexpectedly high memory usage when running CUFFT.ifft() GPU question	3	598	June 2, 2022
CUFFT.plan_fft! take a lot of memory, cannot be freed GPU memory	3	496	August 3, 2023

Usage of CUDA.CUFFT.cufftPlanMany

Related topics