Batched CUDA FFT Plans

CUDA.jl PR1903 added support for FFTs along more directions with CUDA.jl v5. I am a bit confused how this works in practice, as I can’t find it documented. The PR states

This is achieved by allowing fft-plans to have fewer dimensions than the data they are applied to. The trailing dimensions are treated as non-transform directions and transforms are executed sequentially.

So, in practice I’d like to have one plan that applies both to single samples and batched ones, but this doesn’t work as I expected:

using CUDA 
A = CUDA.rand(100); B = CUDA.rand(100,10); 
plan = CUDA.CUFFT.plan_fft(A) 
plan * A # works 
plan * B # doesn't work

Am I misunderstanding what the PR added, or using it wrong? (I am on CUDA.jl v5)

Maybe @RainerHeintzmann could help out?

Thanks for this question. Yes, this was initially allowed. However, there was some critics and I wanted the pull request to go through, so I made sure that the user interface stayed the same as before. See this comment:

The new version therefore now only supports more (almost all) FFT transform directions, but you still have to use the appropriate plan. For your use case you can tell the plan to only transform the first dimension. I would also have preferred the changed interface, as originally planned, but this is democracy :wink:

Thanks for the reply!

Ah, I see. I’ll suggest that change at AbstractFFTs.