Hi folks,
I have a question regarding FFTW transforms, planning them, and multithreading to speed that up. In particular I lunch
julia --threads 4
and once in the REPL Threads.nthreads() says I’m running with 4 threads, as expected. Then I do
using FFTW
FFTW.set_num_threads(4);
x = rand(256,256);
p = FFTW.plan_r2r(x, FFTW.REDFT00, flags=FFTW.MEASURE, timelimit=Inf)
p*x
and what I get is
256×256 Matrix{Float64}:
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮ ⋮ ⋮
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
which obviously is not what I was expecting
I’m sure there’s something I must do with the plan, as the same thing without threading
julia
using FFTW
x = rand(256,256);
p = FFTW.plan_r2r(x, FFTW.REDFT00, flags=FFTW.MEASURE, timelimit=Inf)
p*x
produces the right result
256×256 Matrix{Float64}:
129909.0 60.4656 114.096 0.30322 268.892 -121.121 -235.101 -223.467 … -120.194 -201.206 -150.2 -137.297 442.462 116.235 96.7524
51.8777 -291.243 -488.014 149.204 -137.116 -41.3656 49.7591 -258.836 -155.157 -9.03868 -0.550454 109.489 268.919 -154.248 180.922
630.933 148.047 -41.8292 24.4597 -16.126 58.339 -215.06 -71.8669 77.0846 -43.0625 164.595 -83.5928 -170.836 -102.997 183.399
342.234 -211.079 -24.7421 -14.9125 215.262 20.5422 44.5129 -77.9674 -39.552 -111.646 -123.836 106.946 35.847 34.2531 -262.04
-414.618 178.291 -58.7074 46.9583 -68.4084 -226.081 -132.951 -166.385 -100.26 -8.72139 -101.183 131.73 44.3763 1.19389 55.1724
-202.358 110.894 -215.248 126.64 -168.369 -5.08326 208.937 -167.588 … -19.219 186.312 75.2924 82.0627 127.767 -125.08 -61.0179
248.909 105.885 145.607 178.86 14.3766 224.292 -0.704657 15.0809 -55.5792 -76.1654 200.001 -145.988 -2.58182 -42.0869 -13.4089
-85.562 143.888 -98.6161 59.2594 229.397 15.3008 81.3322 -144.816 -169.611 84.2451 87.8255 -143.672 -181.107 253.519 35.9029
-208.016 35.7251 -66.5396 177.036 -9.18936 -79.4058 41.1725 47.7246 53.9599 -159.221 -59.9068 215.877 -150.166 169.397 -248.746
⋮ ⋮ ⋱ ⋮ ⋮
-178.636 -59.9498 -32.2663 -102.899 67.3999 129.584 -24.0451 -209.073 -53.8703 101.415 -2.88265 44.1844 173.237 157.945 381.742
125.91 191.184 230.229 88.0593 -208.886 -101.536 -22.4082 59.2341 192.649 15.524 79.4824 98.6028 190.453 244.748 256.74
-121.159 -95.4183 -197.874 -98.9116 71.0593 -281.845 -58.6263 -69.2231 92.4262 -284.072 76.5536 -135.233 208.581 -32.5887 -42.3019
-264.951 -53.2538 -56.4839 -236.296 -290.973 -32.3062 -275.826 -85.8879 … 32.8779 -78.7921 -43.7004 14.8938 191.676 206.887 -509.567
-163.145 -332.899 -174.902 -71.1768 15.4622 131.998 -177.128 -172.083 -66.7644 101.452 -16.7669 -95.2793 -245.439 -18.0948 158.567
-87.5535 68.8416 152.911 123.498 -44.9909 118.508 -36.2036 44.8223 -36.182 -129.706 9.01533 -2.17561 26.2493 -33.2069 194.817
323.152 72.7738 106.537 35.7127 -80.016 -39.8272 -1.09276 -90.4345 -11.0056 172.12 150.695 -69.0988 -130.659 -242.492 115.82
-264.635 -38.3354 -11.1509 22.8291 -120.781 -124.438 34.0709 -114.099 8.15804 -23.7875 5.42964 -39.3688 -78.7231 121.556 -134.933
618.445 253.298 275.649 236.409 -176.486 -27.4788 -45.7781 169.874 … 210.461 48.3191 -69.5227 25.4687 231.247 -121.0 61.5904
just the same as using FFTW.r2r directly without plans.
So do you know if there’s anything special I must do here to recover the result when using multithreaded FFTW’s?
Thanks a lot,
Ferran.