I have just upgraded from Julia 0.7.0 → 1.0.5 and 1.3.1, and experienced a serious slowdown.
It seems to me that the reason is the default FFTW version that is:
FFTW v0.2.4 for Julia 0.7.0
FFTW v1.1.0 for Julia 1.0.5
FFTW v1.2.0 for Julia 1.3.1.
If I revert back to FFTW v0.2.4 on Julia 1.0.5/1.3.1 then the speed is fast again.
I have created a minimal code that reproduces the >5x slowdown (see below).
I have tested it on both 64-bit Windows 7 and Windows 10 with the same results.
I am stuck, so I would greatly appreciate any help!
using FFTW, LinearAlgebra
function myfft(plan,x,y)
mul!(y,plan,x)
return
end
siz = (25,36,45)
x = zeros(ComplexF32,siz)
y = zeros(ComplexF32,siz)
plan = plan_fft(x, flags=FFTW.MEASURE)
println("with function call:")
@time for i=1:5000 myfft(plan,x,y) end
@time for i=1:5000 myfft(plan,x,y) end
println("without function call:")
@time for i=1:5000 mul!(y,plan,x) end
@time for i=1:5000 mul!(y,plan,x) end
This is not using AVX (or SSE), so it is not surprising that it is a lot slower. You can get the configuration options by looking at the fftw_version C global:
julia> using FFTW
julia> unsafe_string(cglobal((:fftw_version,FFTW.libfftw3), UInt8))
"fftw-3.3.9-sse2-avx2-avx2_128"
(which indicates that, on my machine, FFTW was configured with SSE2 and AVX2.)
Yes, I realize the difference.
However, my “slow” setup of Julia 1.3.1 + FFTW v1.2.0 also returns the info:
julia> using FFTW
julia> unsafe_string(cglobal((:fftw_version,FFTW.libfftw3), UInt8))
"fftw-3.3.9-sse2-avx2-avx2_128"
So I am still badly stuck.
It is also strange that nobody else complained about the precompiled binaries on Windows.
So I beg all Windows users here to test this minimal code for the appropriate AVX plan:
siz = (25,36,45)
x = zeros(ComplexF32,siz)
plan = plan_fft(x, flags=FFTW.MEASURE)
println(plan)
I would like to test the MWE myself, however, printing the plan’s contents, either with display or with print, does not provide anything besides
julia> println(plan)
FFTW forward plan for 25×36×45 array of Complex{Float32}
julia> display(plan)
FFTW forward plan for 25×36×45 array of Complex{Float32}
Unfortunately, pasting from WSL does not work. But the plan shows avx, and it benchmarks at 278.8 microseconds vs the 1.9 milliseconds (about 7x faster in WSL than Cygwin).
I can confirm that while the version string reports FFTW being configured with AVX and SSE2, the plan does not include those instructions, on FFTW.jl v1.1.0 & Julia 1.1.1.
The last fast version is v0.3. As soon as the PARTR thread work was merged in with v1.0.0 things get slower (~6x).
Now it is clear that this is a general problem of the compiled Windows binary,
and at present the user has no other choice than reverting to a previous version.
So with all due respect I am asking the core developers to fix this problem somehow.
Thank you very much in advance.
julia> using FFTW
julia> ccall((:fftw_have_simd_sse2, FFTW.libfftw3), Cint, ())
ERROR: ccall: could not find function fftw_have_simd_sse2 in library libfftw3-3.dll
Stacktrace:
[1] top-level scope at .\REPL[2]:100:
But this is also the same for both Julia 1.3.1 + FFTW v1.2.0 and Julia 0.7.0 + FFTW v0.2.4
Oh, grr, that’s because it’s not explicitly marked as a DLL export on Windows, so even if it is there you can’t call it. (It might still be working fine internally to FFTW, but there is no way to check that from Julia.)
I have downloaded the suggested file, unzipped and replaced all directories in C:\Users\MYNAME\.julia\artifacts\2193a89e52669d43b28c47b83e738b73d6ed7a50
i.e. bin, include, lib, logs and share. (actually share was not there originally)
And magic happened: Julia 1.3.1 + FFTW v1.2.0 got fast again with the plan containing SSE2 + AVX instructions!