Replacing FFT from Numerical Recipes with FFTW

Hello there.

I’m tasked to port an old FORTRAN 77 legacy code to Julia. Unfortunately, I’m stuck with the computation of a Fast Fourier Transform (FFT).

The original legacy code uses the sinft routine from the Numerical Recipes book (NR). The original version is in FORTRAN 77, but I managed to port it to Julia.
As it is somewhat big, I posted it in this GitHub Gist. I know it gives the correct answer as I have already checked it with the FORTRAN 77 code.

What I’m trying to accomplish is to successfully replace that old subroutine and use FFTW.jl instead.

Here is a sample code, which uses the sinft function from the gist I linked above

function main()
    a = [0.0,
        0.47742664734683210,
        0.75851932870459160,
        0.75733104152093889,
        0.53816912369779424,
        0.25985479516207322,
        6.5606350402976049E-002,
        8.4817932682581046E-004,
        1.6555040127766884E-002,
        3.9188171512774567E-002,
        3.4228256134543805E-002,
        1.3420820869486524E-002,
        6.5631416321955491E-004,
        2.7825880087286466E-003,
        9.4219993149478529E-003,
        9.9865908960776392E-003,
        4.6428562500441268E-003,
        3.3327448406347924E-004,
        8.9848071988801400E-004,
        3.6459412006396999E-003,
        4.2443023850689526E-003,
        2.1409386996221803E-003,
        1.8361141730897351E-004,
        3.9074461539924637E-004,
        1.7587883887050227E-003,
        2.1636202084949888E-003,
        1.1499838894481705E-003,
        1.1004675140620131E-004,
        2.0345801653869668E-004,
        9.8374907856643950E-004,
        1.2557106901589935E-003,
        6.9012160386323635E-004
    ]
    n = length(a)
    b = FFTW.r2r(a, FFTW.RODFT00)
    @show b
    c = FFTW.r2r(a, FFTW.RODFT10)
    @show c
    d = FFTW.r2r(a, FFTW.RODFT01)
    @show d
    e = FFTW.r2r(a, FFTW.RODFT11)
    @show e
    sinft!(a, n)
    @show a
end

main()

and the output is

fftw00 = [2.2839824978332457, 3.865808478514381, 4.727281203842373, 4.8731271633513895, 4.430388346987334, 3.52627738343957, 2.3554519677513883, 1.0916035814443859, -0.06044936548148805, -0.9482420224720198, -1.4349198347123648, -1.4922387963947052, -1.1878127966462944, -0.9024039195111244, -0.7544336573013076, -0.6487524977522947, -0.5746854983744873, -0.5109527882913876, -0.4572230988934566, -0.4071294272689463, -0.36317761272948196, -0.32225146396310084, -0.2854515631367159, -0.25067989805628743, -0.21858457535092202, -0.18781719264787639, -0.1587922173412804, -0.13063489068415524, -0.10357909252012387, -0.07705172337385213, -0.05113541033570179, -0.025469727216377414]
fftw10 = [2.0793925679832097, 3.540836745898296, 4.411873130993651, 4.69904918778722, 4.50023727641295, 3.8943771147970456, 3.0155507529728913, 1.9808566633749198, 0.9463285287804625, 0.046019457467148916, -0.5751837487126715, -0.8217911065999712, -0.6986234327172187, -0.5578142627105277, -0.4834741181879493, -0.4335746010153527, -0.4015978869433933, -0.37755252780671394, -0.35778123449159294, -0.34117680841924375, -0.326536173530114, -0.3148024248728375, -0.3041677537130595, -0.2960230382112425, -0.28837505241028905, -0.2828993208737436, -0.2775606516187846, -0.27420195090090593, -0.27077956214449317, -0.26922722005632865, -0.2675076531557324, -0.2676073339655841]
fftw01 = [1.243700805258963, 3.2418116193843214, 4.452525840891483, 4.894734592726747, 4.649826958896702, 3.8623787670102443, 2.720624387729786, 1.4246826482841162, 0.19094388443409294, -0.796962023389162, -1.3819492164469134, -1.5103369064524736, -1.2283983607363878, -0.9173515045448337, -0.7609829467521276, -0.6493873891836436, -0.5742902048686588, -0.5076348011618095, -0.45375468347680226, -0.40149913632711026, -0.35776824660920276, -0.3151620276234529, -0.27871627890030914, -0.2424743837845238, -0.21074645847268259, -0.17859150548271852, -0.1499037585025491, -0.1203668000930191, -0.09359161262773878, -0.0656279913804777, -0.039911622662318735, -0.012688562150492078]
fftw11 = [1.100731615661763, 2.888152469616345, 4.0510814187228, 4.622312215367247, 4.6568191497328115, 4.239205077660919, 3.4831742683814957, 2.506879365762653, 1.4566061868702822, 0.46800391408490905, -0.3038493851830596, -0.7540537756465204, -0.786778367704533, -0.6145829005559422, -0.5181608498286434, -0.4544324822018745, -0.4171609003337509, -0.38806300324536736, -0.3678971086135975, -0.3486464965752973, -0.3339712796773492, -0.3200728029098461, -0.30955018816292434, -0.29966980779798313, -0.29223263070962957, -0.28532961396706014, -0.2802314150004216, -0.27565684140465374, -0.27245794969244197, -0.2698415635994903, -0.2682975643179987, -0.267445642341788]
a = [0.0, 0.9043828218719977, 1.5534403130729006, 1.9790600667238083, 2.185014687850934, 2.207864522352749, 2.065766579627678, 1.7994284395325442, 1.436707753647941, 1.0275526531433679, 0.6135474148730728, 0.25895888711416704, 0.019003685271627706, -0.0361408278912162, -0.016458308980858587, -0.008379067999873493, -0.0015497022834815022, -0.000180369999170496, 0.0007441171767517737, 7.66079446568018e-5, 0.00022392036694572626, -9.59574096880151e-5, 7.835687804802483e-5, -7.522077772007148e-5, 5.0998401781382485e-5, -4.406720381025693e-5, 3.4210803845441395e-5, -2.651523625643755e-5, 2.0307333832680907e-5, -1.4499172994538248e-5, 9.41448920865895e-6, -4.621610797618872e-6]

Here, the first four arrays represent each of the four variations of the Fourier sine transforms as defined in the documentation for the FFTW.
And a is the in-place result of applying the sinft function from NR to the original a array.

The first thing that is very clear is the first element of the arrays. The sinft always leaves the first element as a zero, and then computes everything else. The FFTW on the other hand, does not.

Further, every variation of the Fourier sine transform from FFTW is giving me different results, as expected.

The question is: how can I reproduce the results from sinft using the functions in FFTW.jl?
Any help would be greatly appreciated.

2 Likes

Reading the Numerical Recipes book I came across this definition, after a lot of algebra and computations

F_k = 2i \sum_{j=0}^{N-1} f_j \sin{(\pi j k / N)}

and what I ended up doing was realizing that this means that if I use the rfft function I should be able
to obtain what I need.

In short, I managed to solve it using the following.

  • First, pad the original array with a new array twice its size.
  • Then, apply the rfft function.
  • Finally, compute its complex conjugate and extract the imaginary part.

At last, the code I used looks something like this

function main()
    a = [0.0,
        0.47742664734683210,
        0.75851932870459160,
        0.75733104152093889,
        0.53816912369779424,
        0.25985479516207322,
        6.5606350402976049E-002,
        8.4817932682581046E-004,
        1.6555040127766884E-002,
        3.9188171512774567E-002,
        3.4228256134543805E-002,
        1.3420820869486524E-002,
        6.5631416321955491E-004,
        2.7825880087286466E-003,
        9.4219993149478529E-003,
        9.9865908960776392E-003,
        4.6428562500441268E-003,
        3.3327448406347924E-004,
        8.9848071988801400E-004,
        3.6459412006396999E-003,
        4.2443023850689526E-003,
        2.1409386996221803E-003,
        1.8361141730897351E-004,
        3.9074461539924637E-004,
        1.7587883887050227E-003,
        2.1636202084949888E-003,
        1.1499838894481705E-003,
        1.1004675140620131E-004,
        2.0345801653869668E-004,
        9.8374907856643950E-004,
        1.2557106901589935E-003,
        6.9012160386323635E-004,
    ]
    n = length(a)
    m = 2 * n
    b = zeros(m)
    b[1:n] = a # We pad with zeros
    @show rfft(b) |> conj |> imag
    sinft!(a, n)
    @show a
end

main()

and the result is

(rfft(b) |> conj) |> imag = [-0.0, 0.9043828218719976, 1.5534403130729006, 1.9790600667238079, 2.1850146878509342, 2.207864522352748, 2.0657665796276783, 1.799428439532543, 1.4367077536479407, 1.0275526531433665, 0.613547414873073, 0.2589588871141659, 0.019003685271627713, -0.03614082789121714, -0.01645830898085865, -0.008379067999874062, -0.0015497022834817242, -0.00018036999917095398, 0.0007441171767516019, 7.660794465619292e-5, 0.0002239203669455736, -9.595740968863682e-5, 7.835687804816383e-5, -7.522077772060466e-5, 5.0998401781576774e-5, -4.4067203810568076e-5, 3.421080384571918e-5, -2.651523625655967e-5, 2.0307333832514374e-5, -1.4499172994675291e-5, 9.414489208547927e-6, -4.621610797617137e-6, -0.0]
a = [0.0, 0.9043828218719977, 1.5534403130729006, 1.9790600667238083, 2.185014687850934, 2.207864522352749, 2.065766579627678, 1.7994284395325442, 1.436707753647941, 1.0275526531433679, 0.6135474148730728, 0.25895888711416704, 0.019003685271627706, -0.0361408278912162, -0.016458308980858587, -0.008379067999873493, -0.0015497022834815022, -0.000180369999170496, 0.0007441171767517737, 7.66079446568018e-5, 0.00022392036694572626, -9.59574096880151e-5, 7.835687804802483e-5, -7.522077772007148e-5, 5.0998401781382485e-5, -4.406720381025693e-5, 3.4210803845441395e-5, -2.651523625643755e-5, 2.0307333832680907e-5, -1.4499172994538248e-5, 9.41448920865895e-6, -4.621610797618872e-6]

There is a trailing zero in the transformed array using rfft, but I can deal with that.

It might not be the best solution, but it’s good enough for me. Although, other solutions are greatly appreciated.

This is a type-I DST (discrete sine transform), which is equivalent to RODFT00 in FFTW.

However unlike Numerical Recipes, FFTW does not store or compute with the redundant j=0 and k=0 elements of the input and output arrays, which are always zero. So, you just omit this when calling FFTW:

julia> FFTW.r2r(a[2:end], FFTW.RODFT00) / 2
31-element Array{Float64,1}:
  0.9043828218719976
  1.5534403130729009
  1.9790600667238076
  2.1850146878509342
  2.2078645223527475
  2.0657665796276783
  1.7994284395325428
  1.4367077536479407
  1.0275526531433667
  0.6135474148730728
  0.25895888711416587
  0.0190036852716276
 -0.0361408278912172
 -0.01645830898085865
 -0.008379067999874298
 -0.0015497022834815022
 -0.0001803699991712454
  0.000744117176751713
  7.660794465624843e-5
  0.00022392036694568462
 -9.595740968876276e-5
  7.835687804830238e-5
 -7.52207777205971e-5
  5.099840178124371e-5
 -4.406720381067153e-5
  3.421080384558017e-5
 -2.6515236256630104e-5
  2.030733383229233e-5
 -1.4499172994897336e-5
  9.414489208436905e-6
 -4.621610797561626e-6

(I also divided by 2 to match the normalization in your sinft! function. In practice, you can usually absorb this or any other normalization factor into some other part of your computation.)

7 Likes

Note that to get the full benefit from FFTW (assuming this computation is performance critical), you should precompute the plan, and possibly preallocate the output:

julia> using FFTW, BenchmarkTools, LinearAlgebra

julia> b = a[2:end]/2; result = copy(b);

julia> p = FFTW.plan_r2r(result, FFTW.RODFT00, flags=FFTW.MEASURE)
FFTW r2r RODFT00 plan for 31-element array of Float64
(rodft00e-r2hc-pad-31
  (rdft-r2hc-direct-r2c-64 "r2cf_64")
  (rdft-rank0-iter-ci/1-x31))

julia> a2 = a*0;

julia> @btime sinft!($a2, $(length(a)));
  570.503 ns (0 allocations: 0 bytes)

julia> @btime mul!($result, $p, $b); 
  397.590 ns (0 allocations: 0 bytes)

Note, by the way, that the Numerical Recipes sinft algorithm suffers from poor accuracy for long lengths (although this does not seem to be widely known—FFTPACK has the same problem); IIRC, its rms floating-point errors grow as O(\sqrt{n}) rather than the O(\sqrt{\log n}) achieved by good FFT algorithms. In consequence, the algorithm that FFTW’s plan p is using above is essentially the same as the one you implemented in terms of rfft, though for larger sizes it may use a different algorithm.

(That being said, REDFT00 and RODFT00 are two of the less-optimized transforms in FFTW. If you really care about the speed here, there are ways to do better.)

11 Likes

I’ve been preallocating and using

FFTW.plan_r2r!(zstore, FFTW.RODFT00, 1)

for several months and thought I was happy. Is there something simple I can do to improve performance.? I’m using the fast sine transform for a Poisson solver.

You could try using a type-II DST (RODFT10) instead (which correponds to shifting the samples by half a grid point, so it should be easy to adapt for a spectral solver). IIRC, the algorithms in FFTW for a DST-II are a bit more efficient than those for a DST-I.

Also, be careful of the factorization of your sizes. RODFT00 is most efficient for lengths n where n+1 is highly composite, but for RODFT11 it should be n that is highly composite.

flags=FFTW.MEASURE or flags=FFTW.PATIENT wouldn’t hurt to try either.

2 Likes

Thanks, @stevengj

I’m stuck with FT00 unless I want to do some massive recoding. n+1 = 2^m for me, which is what it was when I used the fft in matlab for decades. So I’m ok there.

I will look into the flags. I had not thought of that.

Thanks you so much @stevengj ! Learned a lot from these answers. I’ll definitely try out these solutions as well in my code.