I don’t know why this choice was made but it certainly means that in order to get the equivalent of MATLAB’s same operation you’d need to do some work with the indices of the result.
It is not well documented what is the size of the returned data (Or even if it is applying linear convolution of cyclic convolution using the fft() step) and I couldn’t figure it out from the code.
Probably if you write the loop yourself using LoopVectorization.jl you’ll get better performance and be able to customize the indices of the loop to match MATLAB’s full, same or valid modes.
An idea of how to apply convolution using LoopVectorization.jl is given at the Image Filtering example.
Well, it is only faster for really large cases and where the kernel and the signal have similar size.
In most cases this is not the case. This is the reason, for instance, that MATLAB chose direct implementation for conv() while for cross correlation, which usually deals with signals with similar length, it uses fft() based method.
I’m pretty sure that implementing the vanilla loop with vectorization would beat it easily for most real life cases.
Also, The documentation isn’t good in my opinion.
When it says it uses fft(), does it expect the user to pad arrays in order to get the linear convolution result?
I think imfilter() applies correlation and not convolution. Moreover, I am not sure the centered() function won’t yield the result of the same in MATLAB. If I get documentation it only shifts the kernel axis but it shouldn’t change the output result.
I think this is one of those things that’s pretty field-specific. For instance I’m generally not convolving with filters that are small enough for the direct implementation to be efficient. As you mention, for similarly-sized inputs the FFT approach is generally better, and when they are very differently-sized you want to use overlap-save convolution, which uses FFTs in a block-wise fashion. The conv implementation currently selects between those two algorithms depending on the size of the input, but it would be good for it to also select the direct method when one of the inputs is small enough.
Basically it’s a matter of perspective - some folks say “FFT convolution is only useful for really big inputs” and other say “direct convolution is only useful for really small inputs”. Seems like conv could be smart enough to always do the right thing, but someone needs to add that check.
I agree the docs could be improved. I usually expect the convolution of a length-N array and a length-M array to be length N+M-1, but that should be documented.
Though it uses the FFT algorithm, DSP.jl’s conv implementation takes care of zero-padding internally, so it does linear convolution, not circular convolution.
To answer the original question, it seems that what you want depends on the characteristics of your filter b. First off, I’m assuming that the idea is for the output to “line up” with the input. If the filter peaks on the first sample (like exp.(0:-0.5:-10)), then you probably want something like conv(a, b)[1:length(a)]. But if you have some filter that includes a delay, like gaussian(21, 0.15), then you might want conv(a,b)[11:end-10].
Ah okay, I was indexing like you mentioned in your post and was wondering if there was a direct method. I needed this for a MATLAB based tutorial I was doing in Julia, and I was wondering since MATLAB seemed to find the ‘same’ option important I was assuming it had broader application.
The output result has the same dimensions as signal, regardless of the use of centered (you can get a smaller result by passing the padding Inner(), which is like MATLAB’s 'valid'). Centering the kernel means you would get the central part of the full correlation.
We have signal length and kernel length. I only did the case the signal isn’t shorter than the kernel.
So the upper triangle of the matrix, in Dark Blue isn’t relevant.
In the diagonal and lower triangle we have which method was faster:
Light Blue - Direct.
Green - Overlap and Save.
Yellow - Frequency Domain.
I am not sure what cases you use, but as you can see, there is almost no reason to use Overlap & Save.
In my opinion, as long you have the whole data in memory use either direct (Cases the kernel is shorter than ~400-500 samples in the implementation above) or Frequency Domain.
I’d even mention that MATLAB’s conv() is probably much slower than what you can do with LoopVectorization.jl in Julia.
@ElectronicTeaCup, In the code I posted in the answer to the question you will be able to see which exact samples you need to extract in order to replicate MATLAB’s same.
Though the optimal thing to do will be implementing direct convolution using LoopVecotrization.jl which on the computation only compute the required samples (full, same or valid). The current approach in DSP.jl is sub optimal in my opinion.
Overlap-save convolution just turns a convolution with un-equal lengths into a bunch of convolutions with equal-lengths (for each block). So if the blocks are large enough (larger than about 600 samples from your plot) than you want to use FFT processing for each block, not use direct convolution.
But I agree that for short convolutions (seemingly less than about 400 samples from your plot) conv should be doing direct convolution. There’s actually already some code for direct covolution in DSP.jl, so it would be a pretty good intro PR for someone to include it within conv.
As I mentioned before, I think we are using different definitions of “short” and “long”. Your plot only goes up to N=2000, whereas I’m frequently convolving a signal of millions of samples with a signal with some hundreds of thousands, which is a common use-case for overlap-save convolution.
@galenlynch did some similar analysis with kernels from 1 to 2^20 that compared FFT to overlap-save, both in CPU time and memory usage:
edit: that plot has different dimensionalities as well, so for this conversation you want the N=1 (one-dimensional) subplot
There is a closed form equation to the optimal DFT Window for a given length of kernel.
The code I used to create the above image uses the optimal DFT Window.
Still there are no cases Overlap and Save is superior to Direct or Frequency Domain.
Overlap and Save and Overlap and Add were conceived with a constant stream of data in mind.
For data which is loaded to memory either use direct or frequency. I think the cases one would gain from Overlap and Save are edge cases (Probably more related to correlation [Template Matching] than convolution [Filtration]).
In cases of very very long kernel and still order(s) of magnitude longer signal we might see Overlap and Save dominate Frequency Based (Direct method has nothing to sell once the Kernel is bigger than the CPU cache). But I think those cases are more related to template matching (Correlation). Though they are still be implemented using same ideas (As it is flipped kernel) they should have their own dedicated function (xcorr() or something).
I think the point where “fast” (fft) convolution techniques are faster than direct convolution will be with a much smaller kernel size than 400 with well-optimized code. Many sources I’ve seen suggest that the break-even point is a kernel size around 128, while some simple testing with 1D arrays in DSP.jl’s conv (fft) and filt (direct) functions suggests it’s 58 with our convolution code. As you can see from the plots that Russel linked to, there are many cases where overlap-save is faster than simple fft convolution: basically most cases where the two input vectors are not the same size. I can’t speak to why you have different results in your Matlab code. I would be curious to see how DSP.jl’s conv times compare to your Matlab times, if you would be willing to look at that.
As I think Russel pointed out, we already have very fast (and vectorized) direct convolution implemented in DSP.jl as part of the filt function for one dimension, but unfortunately I think it may have slightly different behavior than conv which prevents us from using that code for conv as well (at least that is my recollection). It has been a long-term goal of mine to combine FIR filtering code with convolution code, to allow direct convolution to be used more generally. One difficulty is that I have found it difficult to write fast direct convolution code that works with arbitrary array indexing and an arbitrary number of dimensions, which is currently what conv supports. It might be an easy PR to calculate the break-even point and use direct convolution for small kernels when using just one dimension and 1-indexed arrays.
Ultimately, the choice to use fft convolution for conv was made a long time ago in Julia base. Even though that conv code has since been moved into DSP.jl and modified to use overlap-save when it is faster, we have not yet added a direct convolution branch when that would be faster.
In terms of wishing for better documentation, you’re right that it could be better documented. PRs are, of course, welcome.
Well, talking to @ssfrr, I get that Fourier Domain Convolution is using the next power of 2. This is the reason you get the result you get.
The image I posted shows you clearly that Direct Method easily beat Overlap and Save for kernels up to 400 samples. I bet with highly tuned code (Let’s say like in Intel IPP) it will be even higher.
Have a look at my code, implement the Frequency Domain the way I implemented it (Pad with zeros to M + N - 1 and not nextpow2(M + N - 1)) and you’ll see OVS doesn’t shine as it seems.
The cases OVS is useful wii be the rare cases where the Kernel is huge and the data is still order of magnitude bigger.
As I wrote above, this cases mostly happen for Template Matching (Correlation).
This is the reason MATLAB implements conv() in the direct method and for those rare cases it has xcorr() which is done using the Frequency Domain.