I need to calculate approx 600 FFT’s of 3 dimensional arrays (e.g. 128^3).
I know how to do this on CPUs and also how to do this sequentially on a GPU.
By sequentially I mean that I copy one of the 600 arrays to the GPU, calculate the FFT and send it back to the host. Since the arrays are quite small, i guess i could gain a lot by using a batched FFT calculation.
As far as I understand CUDA.CUFFT.cufftPlanMany does exactly this. But i could not figure out how to use it. Has anyone a working example in Julia?
Maybe some more info what I am doing, in case someone has a better way of solving this:
A, B, C are Arrays of 3-dimensional Arrays
for i in 1:600
tmpA = ifft(A[i])
tmpB = ifft(B[i])
C[i] = fft(tmpA.* (tmpB.^2 + tmpA.^2)