Fair enough. But if you want to do repeated computations with the same kernel you will benefit from saving the invariant parts (for both options).
I got a very modest 16 MB savings in allocations by using a view:
myconv(f,g) = @view DSP.conv(fftshift(f),g)[N:2N-1]
julia> @btime directconv($f,$g);
305.838 ms (88 allocations: 183.11 MiB)
julia> @btime myconv($f,$g);
295.373 ms (87 allocations: 167.85 MiB)
Part of what @GunnarFarneback was referring to is your definition of F
outside of the function, which hides its allocations, about 38 MB, and therefore sees unfair advantage over the other methods.
julia> @btime toepconv($F,$g);
393.324 ms (79 allocations: 106.82 MiB)
julia> toepconv2(f,g) = Toeplitz(f[1:N],f[vcat(1,2N-1:-1:N+1)])*g;
julia> @btime toepconv2($f,$g);
388.091 ms (91 allocations: 144.96 MiB)
I do suspect it should be possible to simultaneously get the speed of DSP.conv
and the allocations of toepconv2
. For example, fftshift(f)
allocates 30 MB by itself, which is more than the gap with myconv
above. Perhaps it could be done in-place, or the convolution altered to avoid shifting at all. Or, there may be a way avoid the reflection in definition of F
.
Depending on your datatype you can also save a factor of 2 by using real valued FFTs rfft
.
If you need performant wrappers for FFT convolution planning (depending on the type), you can also check out FourierTools.jl (no paid advertisement )
We provide a wrapper to get a planned convolution similar to plan_fft
, etc. That provides a little bit of speedup.
If you want to bring memory consumption to almost zero, you can also try fft!
but which only works on complex arrays.
julia> using FFTW
julia> function plan_conv!(v; flags=FFTW.MEASURE)
v_ft = fft(v)
p = plan_fft!(copy(v), flags=flags)
function conv!(u)
u .= (p * u) .* v_ft
return inv(p) * u
end
return conv!
end
plan_in_place (generic function with 2 methods)
julia> v = randn(ComplexF32, (1024, 1024));
julia> u = randn(ComplexF32, (1024, 1024));
julia> @time my_conv! = plan_conv!(v);
0.043965 seconds (25.53 k allocations: 9.676 MiB, 14.00% compilation time)
julia> @time my_conv! = plan_conv!(v);
0.039729 seconds (65 allocations: 8.005 MiB)
julia> @time my_conv!(u);
0.028485 seconds (21.88 k allocations: 8.962 MiB, 72.59% compilation time)
julia> @time my_conv!(u);
0.007707 seconds (2 allocations: 160 bytes)
Edit: Note, that plan_fft!
might overwrite the provided array with zeros. Hence we should copy it before!
EditEdit: I added now a generic version to FourierTools.jl which works without any allocations during convolution itself (excluding planning).