FFTW MKL sub dimensional FFT create plan failed

MKL provider failed for plan sub dimension FFT.

test=Array{ComplexF64}(undef, [[4,4];[4,4]]...);
p1=plan_fft(test,[3,4])
result1=fft(reshape([1:4^4;],4,4,4,4))
result2=p1*reshape([1:4^4;],4,4,4,4)
result1==result2 #false means things are right

But for a MKL provider

julia> plan_fft!(test,[3,4])
ERROR: FFTW could not create plan
Stacktrace:
 [1] error(s::String)
   @ Base .\error.jl:33
 [2] macro expansion
   @ C:\Users\Baiyi Yu\.julia\packages\FFTW\SDUwi\src\fft.jl:576 [inlined]
 [3] (FFTW.cFFTWPlan{ComplexF64, -1, true, 4})(X::Array{ComplexF64, 4}, Y::Array{ComplexF64, 4}, region::Vector{Int64}, flags::UInt32, timelimit::Float64)
   @ FFTW C:\Users\Baiyi Yu\.julia\packages\FFTW\SDUwi\src\FFTW.jl:49
 [4] #plan_fft!#7
   @ C:\Users\Baiyi Yu\.julia\packages\FFTW\SDUwi\src\fft.jl:695 [inlined]
 [5] plan_fft!(X::Array{ComplexF64, 4}, region::Vector{Int64})
   @ FFTW C:\Users\Baiyi Yu\.julia\packages\FFTW\SDUwi\src\fft.jl:695
 [6] top-level scope
   @ REPL[45]:1

Does MKL not support sub-dimensional FFT?
This kind of FFT is quite ordinary for operator of tensor

The transform along an interior dimension (dimension 3 of 4 here) is not supported in MKL because it can’t be mapped to a constant distance between starting points of the FFTs.

MKL has a single parameter for this (DFTI_INPUT_DISTANCE, DFTI_OUTPUT_DISTANCE), implying a 1D array of starting points equally spaced at that distance. FFTW can take a multi-dimensional array of starting points through parameters called “howmany_rank” and “howmany_dims” in their guru interface.

The limitation in MKL is documented in the FFTW3 interface docs (“the only supported values for parameter howmany_rank in guru and guru64 plan creation functions are 0 and 1”).

1 Like

You could use some lazy reshape “trick” to make MKL work on this example.
(As dim1 and dim2 could be merged.)
Ideally we could let FFTW.jl do this for us, but plan_fft(test,[2,4]) is still “broken”.
We need to loop dim3 mannually in this case, but it’s hard to make it efficient on all size.

Edit: If you can’t use FFTW backend for some reason, and you don’t need rfft.
You can try this patched version: GitHub - N5N3/FFTW.jl at SelfUse.

Thank you all for the detailed explanations!

The reason I raise the question is that in my personal computer I find that MKL makes fft faster and support allocation free multithread.
Now I move to cuFFT for large 3D fft in a cluster and it works well except that CUDA.jl now has no high level wrapper for multi-gpu fft support. PencilFFT now supports CuArray but it is complicated to implement if the cluster do not have cuda-aware MPI installed.

Hope that someday CUDA.jl and MPI.jl can just implement cuda-aware MPI with artifacts.