FFTW multithreading and plans

Hi folks,
I have a question regarding FFTW transforms, planning them, and multithreading to speed that up. In particular I lunch

julia --threads 4

and once in the REPL Threads.nthreads() says I’m running with 4 threads, as expected. Then I do

using FFTW

FFTW.set_num_threads(4);
x = rand(256,256);
p = FFTW.plan_r2r(x, FFTW.REDFT00, flags=FFTW.MEASURE, timelimit=Inf)
p*x

and what I get is

256×256 Matrix{Float64}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 ⋮                        ⋮                        ⋮                        ⋮                   ⋱            ⋮                        ⋮                        ⋮                        ⋮
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0     0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  …  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

which obviously is not what I was expecting :slight_smile:

I’m sure there’s something I must do with the plan, as the same thing without threading

julia 

using FFTW

x = rand(256,256);
p = FFTW.plan_r2r(x, FFTW.REDFT00, flags=FFTW.MEASURE, timelimit=Inf)
p*x

produces the right result

256×256 Matrix{Float64}:
 129909.0       60.4656   114.096      0.30322   268.892    -121.121    -235.101     -223.467   …  -120.194    -201.206    -150.2       -137.297     442.462     116.235      96.7524
     51.8777  -291.243   -488.014    149.204    -137.116     -41.3656     49.7591    -258.836      -155.157      -9.03868    -0.550454   109.489     268.919    -154.248     180.922
    630.933    148.047    -41.8292    24.4597    -16.126      58.339    -215.06       -71.8669       77.0846    -43.0625    164.595      -83.5928   -170.836    -102.997     183.399
    342.234   -211.079    -24.7421   -14.9125    215.262      20.5422     44.5129     -77.9674      -39.552    -111.646    -123.836      106.946      35.847      34.2531   -262.04
   -414.618    178.291    -58.7074    46.9583    -68.4084   -226.081    -132.951     -166.385      -100.26       -8.72139  -101.183      131.73       44.3763      1.19389    55.1724
   -202.358    110.894   -215.248    126.64     -168.369      -5.08326   208.937     -167.588   …   -19.219     186.312      75.2924      82.0627    127.767    -125.08      -61.0179
    248.909    105.885    145.607    178.86       14.3766    224.292      -0.704657    15.0809      -55.5792    -76.1654    200.001     -145.988      -2.58182   -42.0869    -13.4089
    -85.562    143.888    -98.6161    59.2594    229.397      15.3008     81.3322    -144.816      -169.611      84.2451     87.8255    -143.672    -181.107     253.519      35.9029
   -208.016     35.7251   -66.5396   177.036      -9.18936   -79.4058     41.1725      47.7246       53.9599   -159.221     -59.9068     215.877    -150.166     169.397    -248.746
      ⋮                                                        ⋮                                ⋱                 ⋮                                                            ⋮
   -178.636    -59.9498   -32.2663  -102.899      67.3999    129.584     -24.0451    -209.073       -53.8703    101.415      -2.88265     44.1844    173.237     157.945     381.742
    125.91     191.184    230.229     88.0593   -208.886    -101.536     -22.4082      59.2341      192.649      15.524      79.4824      98.6028    190.453     244.748     256.74
   -121.159    -95.4183  -197.874    -98.9116     71.0593   -281.845     -58.6263     -69.2231       92.4262   -284.072      76.5536    -135.233     208.581     -32.5887    -42.3019
   -264.951    -53.2538   -56.4839  -236.296    -290.973     -32.3062   -275.826      -85.8879  …    32.8779    -78.7921    -43.7004      14.8938    191.676     206.887    -509.567
   -163.145   -332.899   -174.902    -71.1768     15.4622    131.998    -177.128     -172.083       -66.7644    101.452     -16.7669     -95.2793   -245.439     -18.0948    158.567
    -87.5535    68.8416   152.911    123.498     -44.9909    118.508     -36.2036      44.8223      -36.182    -129.706       9.01533     -2.17561    26.2493    -33.2069    194.817
    323.152     72.7738   106.537     35.7127    -80.016     -39.8272     -1.09276    -90.4345      -11.0056    172.12      150.695      -69.0988   -130.659    -242.492     115.82
   -264.635    -38.3354   -11.1509    22.8291   -120.781    -124.438      34.0709    -114.099         8.15804   -23.7875      5.42964    -39.3688    -78.7231    121.556    -134.933
    618.445    253.298    275.649    236.409    -176.486     -27.4788    -45.7781     169.874   …   210.461      48.3191    -69.5227      25.4687    231.247    -121.0        61.5904

just the same as using FFTW.r2r directly without plans.

So do you know if there’s anything special I must do here to recover the result when using multithreaded FFTW’s?

Thanks a lot,

Ferran.

@Ferran_Mazzanti did you ever find an answer?

Edit:

Actually, I just figured it out.

Your code:

p = FFTW.plan_r2r(x, FFTW.REDFT00, flags=FFTW.MEASURE, timelimit=Inf)
p*x

Is not FFT in place (it’s not fft x in memory of x).
You need to set x equal to the fft:

p = FFTW.plan_r2r(x, FFTW.REDFT00, flags=FFTW.MEASURE, timelimit=Inf)
x = p*x

If you want to use the in-place FFT so that it’s not allocating extra memory you need to use the exclamation mark:

p = FFTW.plan_r2r!(x, FFTW.REDFT00, flags=FFTW.MEASURE, timelimit=Inf)
p*x

Then it should work without setting x equal to the transform.

Note that calling the planner overwrites the data x. This is in the FFTW FAQ.

Or call mul!(y, p, x) to write into a pre-allocated output array y, if you have an out-of-place plan.

2 Likes

Note that calling the planner overwrites the data x. This is in the FFTW FAQ .

There we go! Now I know I wasn’t crazy when my data was changing.

Thanks!