Why I can't use mul! and plan_fft together?

Hi, I just started to use Julia recently, and when I try to use mul! and plan_fft to get better performance, I always get a bug that I can’t deal with
My input is

for i in 1:nz
    for j in 1:nr

and Julia tells me that

MethodError: no method matching mul!(::Matrix{Float64}, ::FFTW.cFFTWPlan{ComplexF64, -1, false, 2, UnitRange{Int64}}, ::Matrix{Float64}, ::Bool, ::Bool)

I’ve tired to search everywhere, but it seems no one has experienced similar situation like me.

It lacks some parameters to be sure but can you try with setting up as Complex ? The output of the FFT is indeed a complex vector and you initialise a Float64.

1 Like

See the plan_fft docs:

To apply P to an array A, use P * A; in general, the syntax for applying plans is much like that of matrices. (A plan can only be applied to arrays of the same size as the A for which the plan was created.) You can also apply a plan with a preallocated output array  by calling mul!(Â, plan, A). (For mul!, however, the input array A must be a complex floating-point array like the output Â.) You can compute the inverse-transform plan by inv(P) and apply the inverse plan with P \  (the inverse plan is cached and reused for subsequent calls to inv or ), and apply the inverse plan to a pre-allocated output array A with ldiv!(A, P, Â).

Note that for a real-valued input array, you may be able to achieve better performance with rfft/plan_rfft, since a FFT on real input produces Hermitian-symmetric output with redundant negative frequency terms.

edit: for 1024 x 1024 real input,

julia> @btime mul!($û_rfft, $p_rfft, $u)
  3.237 ms (0 allocations: 0 bytes)

julia> @btime mul!($û_fft, $p_fft, $u_complex)
  22.086 ms (0 allocations: 0 bytes)

Thank you for your advice, after turning the input array into a complex array, everything seems to work fine. I also realized that rift is much faster than fft, but since I need to get full wave numbers for my simulation, I have to stick on fft this time.:slight_smile:

Instead of trying to anticipate the eltype of the output array, can’t you just do

up = FFT * u0

You’re not saving on allocating the output anyway, at least unless you reuse up multiple times.