2D FFT convolution but get inaccurate results

I am trying to calculate this convlution integral a(x)=\int{\varphi(x,x\prime)}w(x\prime)\rm{d}\it{V\prime},where \varphi(x,x\prime) is a compact support kernel function with a radius \ell. Currently, \varphi(x,x\prime) is choosen as \varphi(x,x\prime)=\frac{5}{\pi\ell^2}(1+3\frac{|x-x\prime|}{\ell})(1-\frac{|x-x\prime|}{\ell})^3 in the support and \varphi(x,x\prime)=0 outside the support. And the following is the code generate the kernel function:

## kernel function φ(x,x′)
function kernel_func(r,ℓ)
    y = 5.0 * (1.0 + 3.0 * r / ℓ) * ( 1.0 - r / ℓ) ^ 3 / (π * ℓ ^ 2) 
    return y
function gene_kernel!(conv_kernel,poicoo_x,poicoo_y,numpoi_x,numpoi_y,ℓ)
    for i in 1 : numpoi_x
        for j in 1 : numpoi_y
            r = sqrt(poicoo_x[i] ^ 2 + poicoo_y[j] ^ 2)
            conv_kernel[i,j] = kernel_func(r,ℓ) * float(r < ℓ) + 0.0im
    return conv_kernel

As this integral has a convolution structure, I attempt to use the convolution theorem as a(x) = \mathcal{F}^{-1}[\mathcal{F}(\varphi)\cdot\mathcal{F}(w)]. On the other hand, for each point x, I can find other points x\prime with the suppot and add all w(x\prime) to get the result.
However, I find that the result from FFT and the result from directly adding is different, with a 17% deviation. Attached is my code.

using FFTW
#### Evaluate a(x) = ∫φ(x,x′)w(x′)dV′
## kernel function φ(x,x′)
function kernel_func(r,ℓ)
    y = 5.0 * (1.0 + 3.0 * r / ℓ) * ( 1.0 - r / ℓ) ^ 3 / (π * ℓ ^ 2) 
    return y
function gene_kernel!(conv_kernel,poicoo_x,poicoo_y,numpoi_x,numpoi_y,ℓ)
    for i in 1 : numpoi_x
        for j in 1 : numpoi_y
            r = sqrt(poicoo_x[i] ^ 2 + poicoo_y[j] ^ 2)
            conv_kernel[i,j] = kernel_func(r,ℓ) * float(r < ℓ) + 0.0im
    return conv_kernel
## loop 
function getpoiloc(i,numpoi_x)
    if i % numpoi_x == 0
        i_x = numpoi_x
        i_y = Int(i / numpoi_x)
        i_x = i % numpoi_x
        i_y = fld(i,numpoi_x) + 1
    return i_x,i_y
## as the kernel is compact support, I just need to loop point pairs within the radius ℓ
function loopconv!(a_loop,infvol,w,ℓ,numpoi_x,numpoi,poicoo_x,poicoo_y,dx,dy)
    a_loop .= 0.0
    infvol .= 0.0
    for ithpoi = 1 : numpoi - 1
        i_x,i_y = getpoiloc(ithpoi,numpoi_x)
        for jthpoi = ithpoi : numpoi
            j_x,j_y = getpoiloc(jthpoi,numpoi_x)
            dist = sqrt((poicoo_x[i_x] - poicoo_x[j_x]) ^ 2 + (poicoo_y[i_y] - poicoo_y[j_y]) ^ 2)
            if dist < ℓ
                φ = kernel_func(dist,ℓ)
                a_loop[i_x,i_y] += φ * w[j_x,j_y] * dx * dy
                a_loop[j_x,j_y] += φ * w[i_x,i_y] * dx * dy
                infvol[i_x,i_y] += φ * dx * dy
                infvol[j_x,j_y] += φ * dx * dy
    a_loop .= a_loop ./ infvol
    return a_loop
## Spatial discretization
length_x = 1.2  # length in x-direction
length_y = 1.0  # length in y-direction
numpoi_x = 2 ^ 8 # number of points in x-direction
numpoi_y = 2 ^ 8 # number of points in y-direction
numpoi = numpoi_x * numpoi_y # number of all points  
dx = length_x / (float(numpoi_x) - 1.0) # distance between points in x-direction
dy = length_y / (float(numpoi_y) - 1.0) # distance between points in y-direction
poicoo_x = collect(-0.5 * length_x : dx : 0.5 * length_x) # x coordinate of points
poicoo_y = collect(-0.5 * length_y : dy : 0.5 * length_y) # y coordinate of points 
## generate kernel function
ℓ = 0.025
conv_kernel = zeros(ComplexF64,numpoi_x,numpoi_y)
## function w(x)
w = rand(numpoi_x,numpoi_y)
## FFT-Based Convolution to evaluate ∫φ(x,x′)w(x′)dV′
# periodic shifted version of the kernel function as DFT requires the first element to be φ[0,0]
shift_kernel = fftshift(conv_kernel)
# φ̂
conv_kernel = fft(shift_kernel)
# a_fft
a_fft = ifft(conv_kernel .* fft(w) .* dx .* dy)
## using brutal force loop to evaluate a(x) = ∫φ(x,x′)w(x′)dV′
a_loop = zeros(Float64,numpoi_x,numpoi_y)
infvol = zeros(Float64,numpoi_x,numpoi_y)
err = maximum(real(a_fft)) - maximum(a_loop) / maximum(a_loop)

Hi! @stevengj and @roflmaostc, I have updated a more trival version which only depends on FFTW and add some comments. I hope this version can be easy to follow. Thanks for your help!

Possibly you are using a different definition of “convolution” — realize that the FFT-based formula is computing a cyclic convolution, with a particular choice of “origin”. In any case, you certainly have a bug somewhere — an FFT-based convolution should match the brute-force loop-based convolution to nearly machine precision.

I can’t give any more details because your code is pretty “convoluted” (pun intended). Normally I would expect a convolution code to look much simpler, like this:

function loopconv(a::AbstractVector, b::AbstractVector)
    n = length(a)
    axes(a,1) == axes(b,1) == 1:n || throw(DimensionMismatch())
    c = fill(zero(eltype(a)) * zero(eltype(b)), n)
    for i = 1:n, j = 1:n
        c[i] += a[j] * b[mod(i-j,n)+1]
    return c

using FFTW
fftconv(a, b) = ifft(fft(a) .* fft(b)) # could use rfft for real data

which matches to nearly machine precision

julia> norm(loopconv(a,b) - fftconv(a,b)) / norm(loopconv(a,b))

(I would suggest separating the code that computes your convolution kernel from the code that performs the convolution.)



I didn’t run the code since there are quite a few dependencies and the code is not trivial to follow.

But did you consider that a FFT based convolution has wraparound artifacts?
To avoid that, you could apply a zero padding such that the image has double the size in each dimension.

select_region can extract or zero pad arrays, I use it for convenience for the zero padding and extraction.

For example:

using FourierTools, DSP, NDTools

x = [1,2,3,2,1]

kernel = [0,0.25,0.5,0.25,0]

 # extract center region since DSP.conv returns padded version
@show select_region(DSP.conv(x, kernel), new_size=(5,))

# conv_psf treats the kernel as centered in real space, same convention as DSP.conv
# extract center regions
@show FourierTools.conv_psf(x, kernel)
# apply zero padding before convolution
@show select_region(FourierTools.conv_psf(select_region(x, new_size=(10, )), select_region(kernel, new_size=(10,))), new_size=(5,))

which yields:

select_region(DSP.conv(x, kernel), new_size = (5,)) = [1.0, 2.0, 2.5, 2.0, 1.0]
FourierTools.conv_psf(x, kernel) = [1.25, 2.0, 2.5, 2.0, 1.25]
select_region(FourierTools.conv_psf(select_region(x, new_size = (10,)), select_region(kernel, new_size = (10,))), new_size = (5,)) =
    [1.0000000000000002, 2.0, 2.5, 2.0, 1.0000000000000002]

As you can see: the first and last statement are approx. the same whereas the non-padded array (second statement) suffers from wraparound artifacts. If you would use an image and large kernels, you could also see this visually. The wraparound is sometimes also called reflecting boundaries.

Thanks for your response!

I am sorry that I don’t quite understand this point. Do you mean that I cannot evaluate this integral \int{\varphi(x,x\prime)w(x\prime)dV\prime} by FFT convolution?

I am happy to know that FFT-based convolution will match the brutal force loop-based convolution, then there is certainly some bugs with my code.

I am sorry for this. The reason for the complicated code is that the convolution function is frequently called in a bigger program, and I am trying to speed it up and reduce the memory allocation.

Thanks for your nice suggestion, and I will modify the code the post and add some comments to make it easy to follow.

Thanks for your reply!

I am sorry for this and I will modify the code in the post and add some comments to make it easy to follow.

I understand that zeros padding is necessary in 1D FFT-basd convolution, otherwise the result will be wrong, but I have no idea about zeros padding in 2D convolution. Actually, I have compared my fft-convolution result with FouriedTools.conv(), without zero padding, the result is just the same. I will try select region and post the result later. Thanks!

I have tried select_region but the result is weird.

index_ftools = select_region(FourierTools.conv(select_region(shift_kernel,new_size=(2^9,2^9)),select_region(w,new_size=(2^9,2^9))),new_size = (2^8,2^8)) * dx * dy

The resulting coutour has four similar parts divided by a horizontal line and a vertical line.

Since your kernel is centered at the middle position of the array, you should use conv_psf which uses ifftshift and fftshift methods for correct aligning.

FFTs don’t evaluate integrals, they compute sums, so I assume you mean the integral approximated by a sum ala the trapezoidal rule. Write your convolution as a sum, but then pay careful attention to (a) the boundary conditions (cyclic for FFT) and (b) the origin (= first element for FFT). Compare to the definition of cyclic convolution in the article I linked, or in the code I posted.

In short, you have to make sure that your formulation of a (discrete) convolution matches the one in the FFT exactly.

1 Like

Thanks for your reply! Actually I have shift the kernel in this line

I tried conv_psf and select_region with the original kernel (without shift), the error is still large.

Thanks for your clear explanation! Actually I mean the discrete summation of the integral as you assumed. I have read the article you linked and program the circular convolution according to its defination in 1D case, and the result from circular convolution and FFT agree well with each other. This means that the way I calculate the discrete summation is somehow different from circular convolution. I will have a check. Thanks.