Avoid allocations in the naive discrete Fourier transform

stakaz · September 18, 2020, 2:02pm

Hello, I try to implemente a straight forwared discrete Fourier transform abut I do not see what variables are allocated in the intermediate steps. For my undertanding it is a generator expression and should get the result without allocating anything?

julia> function dft(A::AbstractArray{T,N}, CT::Type = Float64) where {T,N}
           F = similar(A, Complex{CT})
           @inbounds for k ∈ CartesianIndices(F)
               F[k] = sum(exp(- im * 2π * sum((k[d] - 1) * (n[d] - 1) / size(A, d) for d ∈ 1:N)) * A[n] for n ∈ CartesianIndices(A))
           end
           return F
       end
dft (generic function with 3 methods)

julia> A = rand(Float64, 20, 20);

julia> using BenchmarkTools

julia> @btime dft($A);
  5.331 ms (2401 allocations: 100.13 KiB)

stevengj · September 18, 2020, 2:57pm

I didn’t quite track it down, but I suspect that the compiler has a hard time doing type-inference through those nested generator expressions?

Here is a variant that does no allocations (other than allocating F), and uses the element type of A to determine the element type of the output F:

function dft(A::AbstractArray{T,N}) where {T,N}
    F = zeros(complex(float(T)), size(A)...)
    @inbounds for k ∈ CartesianIndices(F), n ∈ CartesianIndices(A)
        F[k] += cis(-2 * real(eltype(F))(π) * sum(d -> (k[d] - 1) * (n[d] - 1) / real(eltype(F))(size(A, d)), ntuple(identity, Val{N}()))) * A[n]
    end
    return F
end

Note that the 2π expression in your exponent will be computed in double precision by default, even if you want some other precision for F. To force it into the desired precision, use real(eltype(F))(π); you have to do the same thing for the division by size(A,n) since otherwise division of two integers will default to double precision. You can also use cis(x) instead of exp(im*x). I also used ntuple(identity, Val{N}()) instead of 1:N, which allows the compiler to unroll the inner loop by supplying the length at compile time.

stakaz · September 18, 2020, 3:07pm

Thanks for the reply, however, copy-paste your function shows a factor of 100 slowdown in speed and factor 1000 in allocations:

125.610 ms (2080001 allocations: 48.83 MiB)

Have you checked it on your machine?

stevengj · September 18, 2020, 3:07pm

Had a typo that is now corrected.

stakaz · September 18, 2020, 3:09pm

Yeah, works now, thanks

stakaz · September 18, 2020, 3:16pm

Ok, but unfortunately it does not come up with no allocations when using BigFloat… Than it is a lot slower then the original code…

julia> function dft(A::AbstractArray{T,N}, CT::Type = Float64) where {T,N}
               F = similar(A, Complex{CT})
               PI = real(eltype(F))(π)
               @inbounds for k ∈ CartesianIndices(F)
                       F[k] = sum(cis(-2 * PI * sum((k[d] - 1) * (n[d] - 1) / size(A, d) for d ∈ 1:N)::Float64) * A[n] for n ∈ CartesianIndices(A))
               end
               return F
       end
dft (generic function with 2 methods)

julia> function dft_new(A::AbstractArray{T,N}) where {T,N}
           F = zeros(complex(float(T)), size(A)...)
           @inbounds for k ∈ CartesianIndices(F), n ∈ CartesianIndices(A)
               F[k] += cis(-2 * real(eltype(F))(π) * sum(d -> (k[d] - 1) * (n[d] - 1) / real(eltype(F))(size(A, d)), ntuple(identity, Val{N}()))) * A[n]
           end
           return F
       end
dft_new (generic function with 1 method)

julia> using BenchmarkTools

julia> A = rand(Float64, 20, 20);

julia> @btime dft($A);
  4.997 ms (4801 allocations: 218.88 KiB)

julia> @btime dft_new($A);
  3.416 ms (1 allocation: 6.38 KiB)

julia> B = rand(BigFloat, 20, 20);

julia> @btime dft($B, BigFloat);
  1.706 s (6058760 allocations: 240.00 MiB)

julia> @btime dft_new($B);
  1.807 s (7976768 allocations: 342.43 MiB)

Oscar_Smith · September 18, 2020, 3:40pm

Does cispi work for bigfloat? If so, it might be faster.

simeonschaub · September 18, 2020, 3:49pm

Unfortunately, we don’t have a cispi function yet: https://github.com/JuliaLang/julia/pull/35792. I hope we can get this into 1.6 though.

stevengj · September 18, 2020, 4:18pm

BigFloat operations allocate:

julia> x, y = big"1.0", big"2.0"
(1.0, 2.0)

julia> @btime $x + $y;
  53.641 ns (2 allocations: 112 bytes)

which is not surprising since a BigFloat is stored on the heap.

If you care about performance, why aren’t you using an FFT algorithm?

stakaz · September 18, 2020, 6:29pm

Basically, I would like to perform a test on accuracy of the Float64 fft results and I wanted to have the most simple solution. I do not have the time to implement a fft on my own.

Probably I need DoubleFloat or Float128 instead of BigFloat…

stevengj · September 18, 2020, 9:58pm

You could use this one, for example.

stakaz · September 18, 2020, 10:20pm

As far as I can see it, it works only for 1d arrays.

stevengj · September 19, 2020, 12:23am

The simplest multi-dimensional FFT is just a sequence of 1d FFTs along each direction. So, for a 3d array, you would just need to call this with mapslices 3 times.

Topic		Replies	Views
Avoiding allocations in a map over a tuple General Usage tuple , memory-allocation	3	376	November 2, 2023
Memory on array element assignment Performance	10	431	August 3, 2022
Where does this allocation come from and how to avoid it? StaticArrays and structures Performance question , function , struct , staticarrays , allocations	3	379	July 19, 2022
Help understanding allocation on scalar function and possible ways to avoid it Performance	3	899	December 7, 2017
Allocations (again...) Performance question , allocations	15	662	November 25, 2022

Avoid allocations in the naive discrete Fourier transform

Related topics