I can’t find how to set the length of FFTW.fft, just like python or matlab.
Maybe this’s a naive question, but it troubles me a littele.
Thanks for your time.
“Setting the length” in Matlab’s
fft function just corresponds to zero-padding or truncating the data, which you can easily do manually:
newdata = zeros(newlength) newdata[1:length(olddata)] = olddata
I remember wondering about this myself at some point. Could there be a way to ‘virtually’ zero-pad without actually copying over data? Sometimes I zero-pad by quite a lot to increase the frequency resolution. Is it possible to exploit the knowledge that much of the signal is ‘structurally zero’ in order to optimize performance and perhaps avoid the extra allocations?
Yes, it’s called a pruned FFT and can be implemented on top of an
You can also use a chirp-Z transform (sometimes called a “zoom FFT”) via Bluestein’s algorithm (also FFT-based) to interpolate the spectrum to “higher resolution” in a portion of the spectrum.
(Neither of these techniques actually increases the resolution in the sense of truly resolving finer spectral features, however — they are just forms of trigonometric interpolation.)
I gather that there isn’t much to gain in terms of performance. But still, since zero-padding is so common, particularly to lengths that are more efficient (e.g. 2^N) I think it would be nice to have it as part of the interface, that is, that the padded length would be an input argument, just like in Matlab.
fft algorithm (in FFTW and MKL) is O(N \log N) for all N (even prime N). 20 years ago, Matlab used an O(N^2) algorithm for prime N, and getting into the habit of zero-padding was essential for performance. Similarly, SciPy long used FFTPACK for its
fft function, which only supports certain small prime factors, and hence it was essential to zero-pad to “fast” lengths to avoid falling into an O(N^2) case. However, last year the
numpy.fft function apparently got an O(N \log N) algorithm for large prime factors, and scipy subsequently adopted the same code.
With FFTW or MKL, in contrast, while power-of-two-sizes are still more efficient, it’s only by a constant factor, and it’s not necessarily worth the increase in N from zero-padding. For example, if you have N=1200, it’s > 40\% slower on my machine to zero-pad to N=2048 (the next power of 2). With a larger prime N, you lose only a factor of \sim 10 compared to nearby composite sizes, which is not a big deal unless you are performing lots of transforms of the same size (in which case you can usually pick your initial size differently). So in general I wouldn’t recommend zero-padding for performance, particularly since this changes the transform that is computed.
If you want more (interpolated) resolution in a certain portion of the spectrum, my feeling is that you are better off with a “Zoom FFT” (a chirp-z transform). Another application of zero-padding is for linear convolutions, but for that you probably want a specialized convolution routine anyway.
I’ve experimented a bit in Matlab (which uses fftw as far as I know) with padding to the nearest length of 2^k 3^l 5^m... for small exponents, and got significant speedup (~50%) in performance sensitive code. So I do think zero padding is worthwhile in some cases.
As I said, large prime sizes are a factor of ~10 slower. But this only matters if you are doing lots of FFTs of the same size, and in such applications you can usually choose the initial length differently rather than zero padding.
Indeed, it was millions (and over time probably billions) of ffts.
I’m not completely sure what you mean here:
I didn’t have any influence over the length of the received signal, if that’s what you mean.
I was just saying that it’s convenient to zero pad by providing the padded length as an input parameter instead of doing it manually. But if I understand correctly, there’s no extra performance lost by manual padding.
My argument was that if you are performing FFTs over and over of the same size, then you probably also control the source of the data you are transforming, and hence can adjust its size.
On the other hand, if someone hands you some data and you want to compute a spectrum, then you don’t control the size. But in that case you only compute an FFT once, so a small constant factor in performance (if the data has an “unlucky” size) is not so important.
Basically, my feeling is that the zero-padding parameter in Matlab and Python’s
fft function dates from the days when they couldn’t handle large prime factors efficiently, so zero-padding was an every-day need, but that’s no longer the case — the need for zero-padding is in much more specialized applications. In any case, manual padding in Julia is equivalent to what Matlab and Python do if you pass a padding parameter (I’m pretty sure they don’t use pruned FFTs, and in any case pruned FFTs gain you little or nothing if you pad by only a small factor).
I hope that I am not hijacking the thread but to compute convolution over the GPU, I use zero padding which double the size in 1d. You are saying there is a way to bypass this padding?
I mentioned linear convolution specifically as a real use-case of zero-padding above. But I feel like that belongs in a
conv function in a library — it’s not something users should need to do explicitly very often.
I wonder if it would be worth it to make
DSP.nextfastfft somewhat smarter so that an FFT of size
nextfastfft(N) would always be faster than an FFT of size
N (or at least usually faster, based on some heuristic). I’ve been in the habit of assuming I should pad using
nextfastfft when performance matters, but this thread (and a previous conversation with @RoyiAvital) indicate that’s a pretty questionable practice.
Alternatively, could we make FFT automatically pad and unpad?
I am interested in the non performance case.
The question is, can I apply DFT transform on a vector of length N yet the output will be of length M without changing the length of the input. Namely only control the allocation of the output.
I looked on the documentation of
FFTW 3.3.8 and as far as I can see in Complex One Dimensional DFTs and One Dimensional DFTs of Real Data the length of the output is fully determined by the length of the input. So, at least using
FFTW one can not have that, right?
Padding changes the result, so it shouldn’t be automatic.
That’s not a DFT. The definition of the discrete Fourier transform requires that the input and output have equal lengths.
I am not into the Mathematical definitions. As if so I’d also say the classic DFT is unitary and in practice it is calculated differently (
FFTW if I get it right doesn’t scale at all). So let’s keep using it in the broad and somewhat abusing meaning it is mostly used.
What I meant, does
FFTW have option to have the input of length N and the output of length M and do the interpolation in complex domain on itself? From what I understand it can’t.
I just want to understand what happens when we see a code in MATLAB or Python that does so. If
FFTW can’t do it, it means that behind the scene zeros are padded in the case of M > N and in the other case interpolation of the output is made. As I think the OP might was looking for optimization where there is such one in MATLAB or Python.
Do you have good heuristics for adjusting this size other than benchmarking? I currently do
nextprod([2, 3, 5], N), but I wonder what it should be for optimal performance. edit: optimal performance in the sense “what’s the M >= N that’s the fastest”
That’s a decent rule of thumb, I think. FFTW also handles factors of 7 efficiently, so it wouldn’t hurt to include that, but 5-smooth numbers are already pretty dense and are probably marginally more efficient than 7-smooth numbers in FFTW. For real-data transforms, it is best if the length is even.