I am working on a function with calculation that have structure similar to FFT. I’m a bit disappointed, cause I hoped it will be 10x slower than FFTW but benchmarks show its 100x slower. i tried to put @simd and @inbounds but it have minimal effect. Could you tell me if there is something obviously wrong? I don’t ask about algorithm analysis, but just the code quality. I am not interested in adding multithreading, cause this package is just a part of a bigger thing, and parallelisation will be potentially done on a higher level.
The code is in
Example benchmark code:
using NumberTheoreticTransforms, FFTW const x = mod.(rand(Int, 4096), 65537); @btime fnt($x, $169, $65537); # 6.306 ms (4 allocations: 32.42 KiB) @btime fft($x); #61.354 μs (53 allocations: 131.02 KiB) const x2 = mod.(rand(Int, 8192), 65537); @btime fnt($x2, $225, $65537); #15.061 ms (4 allocations: 64.45 KiB) @btime fft($x2); #130.380 μs (53 allocations: 259.02 KiB)