Usage of CUDA.CUFFT.cufftPlanMany

I need to calculate approx 600 FFT’s of 3 dimensional arrays (e.g. 128^3).

I know how to do this on CPUs and also how to do this sequentially on a GPU.
By sequentially I mean that I copy one of the 600 arrays to the GPU, calculate the FFT and send it back to the host. Since the arrays are quite small, i guess i could gain a lot by using a batched FFT calculation.
As far as I understand CUDA.CUFFT.cufftPlanMany does exactly this. But i could not figure out how to use it. Has anyone a working example in Julia?

Maybe some more info what I am doing, in case someone has a better way of solving this:
A, B, C are Arrays of 3-dimensional Arrays

Pseudo code:
for i in 1:600
tmpA = ifft(A[i])
tmpB = ifft(B[i])
C[i] = fft(tmpA.* (tmpB.^2 + tmpA.^2)


I found the solution:
There is no need to invoke CUDA.CUFFT.cufftPlanMany. The functionality of batched fft’s is contained in julias AbstractFFT structure.

Eg if N ffts of size 128^3 need to be calculated, then one simply copies the data of the 128^3 arrays in an 3+1 dimensional array (extension in each dimension 128,128,128, N): the first one to newarray(:,:,:,1), the second one to newarray(:,:,:,2) and so forth up to newarray(:,:,:,N).
Having assembled newarray, the next step is to simply performing the fft along the first 3 dimensions:
fft(newarray,[1,2,3]). This automatically computes all the N fft’s.
One then extracts the individual N 128^3 arrays from the returned 3+1 dimensional array like outlined above. Works as well on the GPU!

1 Like