Avoid allocations in the naive discrete Fourier transform

Hello, I try to implemente a straight forwared discrete Fourier transform abut I do not see what variables are allocated in the intermediate steps. For my undertanding it is a generator expression and should get the result without allocating anything?

julia> function dft(A::AbstractArray{T,N}, CT::Type = Float64) where {T,N}
           F = similar(A, Complex{CT})
           @inbounds for k ∈ CartesianIndices(F)
               F[k] = sum(exp(- im * 2π * sum((k[d] - 1) * (n[d] - 1) / size(A, d) for d ∈ 1:N)) * A[n] for n ∈ CartesianIndices(A))
           end
           return F
       end
dft (generic function with 3 methods)

julia> A = rand(Float64, 20, 20);

julia> using BenchmarkTools

julia> @btime dft($A);
  5.331 ms (2401 allocations: 100.13 KiB)
1 Like

I didn’t quite track it down, but I suspect that the compiler has a hard time doing type-inference through those nested generator expressions?

Here is a variant that does no allocations (other than allocating F), and uses the element type of A to determine the element type of the output F:

function dft(A::AbstractArray{T,N}) where {T,N}
    F = zeros(complex(float(T)), size(A)...)
    @inbounds for k ∈ CartesianIndices(F), n ∈ CartesianIndices(A)
        F[k] += cis(-2 * real(eltype(F))(π) * sum(d -> (k[d] - 1) * (n[d] - 1) / real(eltype(F))(size(A, d)), ntuple(identity, Val{N}()))) * A[n]
    end
    return F
end

Note that the expression in your exponent will be computed in double precision by default, even if you want some other precision for F. To force it into the desired precision, use real(eltype(F))(π); you have to do the same thing for the division by size(A,n) since otherwise division of two integers will default to double precision. You can also use cis(x) instead of exp(im*x). I also used ntuple(identity, Val{N}()) instead of 1:N, which allows the compiler to unroll the inner loop by supplying the length at compile time.

5 Likes

Thanks for the reply, however, copy-paste your function shows a factor of 100 slowdown in speed and factor 1000 in allocations:

125.610 ms (2080001 allocations: 48.83 MiB)

Have you checked it on your machine?

Had a typo that is now corrected.

Yeah, works now, thanks :wink:

Ok, but unfortunately it does not come up with no allocations when using BigFloat… :frowning: Than it is a lot slower then the original code…

julia> function dft(A::AbstractArray{T,N}, CT::Type = Float64) where {T,N}
               F = similar(A, Complex{CT})
               PI = real(eltype(F))(π)
               @inbounds for k ∈ CartesianIndices(F)
                       F[k] = sum(cis(-2 * PI * sum((k[d] - 1) * (n[d] - 1) / size(A, d) for d ∈ 1:N)::Float64) * A[n] for n ∈ CartesianIndices(A))
               end
               return F
       end
dft (generic function with 2 methods)

julia> function dft_new(A::AbstractArray{T,N}) where {T,N}
           F = zeros(complex(float(T)), size(A)...)
           @inbounds for k ∈ CartesianIndices(F), n ∈ CartesianIndices(A)
               F[k] += cis(-2 * real(eltype(F))(π) * sum(d -> (k[d] - 1) * (n[d] - 1) / real(eltype(F))(size(A, d)), ntuple(identity, Val{N}()))) * A[n]
           end
           return F
       end
dft_new (generic function with 1 method)

julia> using BenchmarkTools

julia> A = rand(Float64, 20, 20);

julia> @btime dft($A);
  4.997 ms (4801 allocations: 218.88 KiB)

julia> @btime dft_new($A);
  3.416 ms (1 allocation: 6.38 KiB)

julia> B = rand(BigFloat, 20, 20);

julia> @btime dft($B, BigFloat);
  1.706 s (6058760 allocations: 240.00 MiB)

julia> @btime dft_new($B);
  1.807 s (7976768 allocations: 342.43 MiB)

Does cispi work for bigfloat? If so, it might be faster.

Unfortunately, we don’t have a cispi function yet: https://github.com/JuliaLang/julia/pull/35792. I hope we can get this into 1.6 though.

BigFloat operations allocate:

julia> x, y = big"1.0", big"2.0"
(1.0, 2.0)

julia> @btime $x + $y;
  53.641 ns (2 allocations: 112 bytes)

which is not surprising since a BigFloat is stored on the heap.

If you care about performance, why aren’t you using an FFT algorithm?

2 Likes

Basically, I would like to perform a test on accuracy of the Float64 fft results and I wanted to have the most simple solution. I do not have the time to implement a fft on my own.

Probably I need DoubleFloat or Float128 instead of BigFloat…

You could use this one, for example.

As far as I can see it, it works only for 1d arrays.

The simplest multi-dimensional FFT is just a sequence of 1d FFTs along each direction. So, for a 3d array, you would just need to call this with mapslices 3 times.

5 Likes