Hi,

I need to calculate approx 600 FFT’s of 3 dimensional arrays (e.g. 128^3).

I know how to do this on CPUs and also how to do this sequentially on a GPU.

By sequentially I mean that I copy one of the 600 arrays to the GPU, calculate the FFT and send it back to the host. Since the arrays are quite small, i guess i could gain a lot by using a batched FFT calculation.

As far as I understand CUDA.CUFFT.cufftPlanMany does exactly this. But i could not figure out how to use it. Has anyone a working example in Julia?

Maybe some more info what I am doing, in case someone has a better way of solving this:

A, B, C are Arrays of 3-dimensional Arrays

Pseudo code:

for i in 1:600

tmpA = ifft(A[i])

tmpB = ifft(B[i])

C[i] = fft(tmpA.* (tmpB.^2 + tmpA.^2)

Thanks