How to set the length of fft

I’ve experimented a bit in Matlab (which uses fftw as far as I know) with padding to the nearest length of 2^k 3^l 5^m... for small exponents, and got significant speedup (~50%) in performance sensitive code. So I do think zero padding is worthwhile in some cases.

As I said, large prime sizes are a factor of ~10 slower. But this only matters if you are doing lots of FFTs of the same size, and in such applications you can usually choose the initial length differently rather than zero padding.

Indeed, it was millions (and over time probably billions) of ffts.

I’m not completely sure what you mean here:

I didn’t have any influence over the length of the received signal, if that’s what you mean.

I was just saying that it’s convenient to zero pad by providing the padded length as an input parameter instead of doing it manually. But if I understand correctly, there’s no extra performance lost by manual padding.

My argument was that if you are performing FFTs over and over of the same size, then you probably also control the source of the data you are transforming, and hence can adjust its size.

On the other hand, if someone hands you some data and you want to compute a spectrum, then you don’t control the size. But in that case you only compute an FFT once, so a small constant factor in performance (if the data has an “unlucky” size) is not so important.

Basically, my feeling is that the zero-padding parameter in Matlab and Python’s fft function dates from the days when they couldn’t handle large prime factors efficiently, so zero-padding was an every-day need, but that’s no longer the case — the need for zero-padding is in much more specialized applications. In any case, manual padding in Julia is equivalent to what Matlab and Python do if you pass a padding parameter (I’m pretty sure they don’t use pruned FFTs, and in any case pruned FFTs gain you little or nothing if you pad by only a small factor).

2 Likes

I hope that I am not hijacking the thread but to compute convolution over the GPU, I use zero padding which double the size in 1d. You are saying there is a way to bypass this padding?

I mentioned linear convolution specifically as a real use-case of zero-padding above. But I feel like that belongs in a conv function in a library — it’s not something users should need to do explicitly very often.

I wonder if it would be worth it to make DSP.nextfastfft somewhat smarter so that an FFT of size nextfastfft(N) would always be faster than an FFT of size N (or at least usually faster, based on some heuristic). I’ve been in the habit of assuming I should pad using nextfastfft when performance matters, but this thread (and a previous conversation with @RoyiAvital) indicate that’s a pretty questionable practice.

1 Like

I am interested in the non performance case.
The question is, can I apply DFT transform on a vector of length N yet the output will be of length M without changing the length of the input. Namely only control the allocation of the output.

I looked on the documentation of FFTW 3.3.8 and as far as I can see in Complex One Dimensional DFTs and One Dimensional DFTs of Real Data the length of the output is fully determined by the length of the input. So, at least using FFTW one can not have that, right?

Padding changes the result, so it shouldn’t be automatic.

That’s not a DFT. The definition of the discrete Fourier transform requires that the input and output have equal lengths.

4 Likes

I am not into the Mathematical definitions. As if so I’d also say the classic DFT is unitary and in practice it is calculated differently (FFTW if I get it right doesn’t scale at all). So let’s keep using it in the broad and somewhat abusing meaning it is mostly used.
What I meant, does FFTW have option to have the input of length N and the output of length M and do the interpolation in complex domain on itself? From what I understand it can’t.
I just want to understand what happens when we see a code in MATLAB or Python that does so. If FFTW can’t do it, it means that behind the scene zeros are padded in the case of M > N and in the other case interpolation of the output is made. As I think the OP might was looking for optimization where there is such one in MATLAB or Python.

Do you have good heuristics for adjusting this size other than benchmarking? I currently do nextprod([2, 3, 5], N), but I wonder what it should be for optimal performance. edit: optimal performance in the sense “what’s the M >= N that’s the fastest”

1 Like

That’s a decent rule of thumb, I think. FFTW also handles factors of 7 efficiently, so it wouldn’t hurt to include that, but 5-smooth numbers are already pretty dense and are probably marginally more efficient than 7-smooth numbers in FFTW. For real-data transforms, it is best if the length is even.

4 Likes

As I said, if you use the padding option in Matlab or Python’s fft functions, it is just zero padding and computing the same thing as the “manual” 2-line Julia padding code above.

But this is not something you want to do blindly because it changes the transform you are computing (much more than a trivial change in normalization).

I am not into the Mathematical definitions.

You have to define mathematically what transform you want. There are infinitely many ways to have DFT-like functions with N inputs and M outputs, such as zero-padding the input (if M > N) or doing some other sampling in the frequency domain (as in a “Zoom FFT”).

Typically, what people do corresponds to a discrete-time Fourier transform (DTFT) with some window function implied for the input and some sampling in frequency domain for the output, but the choices of windowing and frequency-sampling matter a lot (and determine what algorithms are applicable)!

You may not be “into” it, but not understanding what you are doing is a good way to get incorrect results.

2 Likes

Alright, thanks!

I do understand it deeply. This is what I do.
I meant in the context of this thread I thought it is obvious that we’re talking about what’s people usually mean.
If you want it more rigorously, I’d define it by the DFT Matrix as this is what usually people mean when they says the case of N \to M .
I was under the impression FFT Libraries have optimized code path for those cases as well.
Since not every day we have a chance to ask an expert like you in the field I was wondering if there is something I miss in FFTW’s documentation about this.
Specifically I was almost sure that the “easy” case (Where M > N ) has a special treatment,

The DFT matrix is square (as in the Wikipedia article you linked). A non-square transform is not a DFT in any sense (regardless of normalization). I guess you mean a zero-padded DFT (i.e. the first N columns of an MxM DFT), but if that is what you want you need to say so. There are infinitely many possible non-square Fourier-like transforms and they can all be represented as matrices. e.g. a chirp-z transform is also a very common and useful choice.

Nope, not typically.

See my comments/links above. A pruned FFT is theoretically possible, but only becomes practically beneficial (for 1d transforms) when you zero pad by a lot (multiplying the length by a large factor, typically 100x or more), and if you are doing this probably a “Zoom FFT” is better for you anyway (but is a different beast because you have to specify the desired frequency sampling).

For example, you can see numpy’s zero-padding code, which is precisely the trivial algorithm I suggested above.

1 Like

OK, In order to be exact I meant the matrix form of:

X \left[ k \right] = \sum_{n = 0}^{N - 1} x \left[ n \right] {e}^{-j 2 \pi \frac{k n}{M} }, \; k \in \left\{ 0, 1, 2, \ldots, M - 1 \right\}

OK, I was under the impression (With little knowledge about how it’s implemented) that it would be easy to tell the FFT algorithm how to “extrapolate” M - N zeros without actually having them in the array.

Thank you for sharing that knowledge.

Wow! Thank you! I didn’t know about that function.

I normally use an algorithm from a function on the Matlab File Exchange: FFTWN: Round up to efficient FFT integer - File Exchange - MATLAB Central
The description on its page has this to say:

The MATLAB FFT implementation uses the FFTW library. With default
settings, FFTW is highly efficient for data lengths of the form
N = 2^a3^b5^c7^d11^e*13^f, where e + f = 0 or 1, and the rest
of the exponents are arbitrary.

So that’s what it does, finds the next N that satisfies that expression. I wonder if that statement is actually accurate!

As for how the zero-padding is actually accomplished (after having found an efficient length), for me it’s less about performance, and more about having a convenient interface. That’s why I enjoy having the padding length as part of the interface.

(BTW: Julia’s nextprod is 50 times faster than the Matlab function, naturally )

FWIW this (including 7 as well) is how DSP.nextfastfft is implemented (here). If there’s a better heuristic that would be a good place to make the improvement.

1 Like