# Avoid allocations in the naive discrete Fourier transform

Hello, I try to implemente a straight forwared discrete Fourier transform abut I do not see what variables are allocated in the intermediate steps. For my undertanding it is a generator expression and should get the result without allocating anything?

``````julia> function dft(A::AbstractArray{T,N}, CT::Type = Float64) where {T,N}
F = similar(A, Complex{CT})
@inbounds for k ∈ CartesianIndices(F)
F[k] = sum(exp(- im * 2π * sum((k[d] - 1) * (n[d] - 1) / size(A, d) for d ∈ 1:N)) * A[n] for n ∈ CartesianIndices(A))
end
return F
end
dft (generic function with 3 methods)

julia> A = rand(Float64, 20, 20);

julia> using BenchmarkTools

julia> @btime dft(\$A);
5.331 ms (2401 allocations: 100.13 KiB)
``````
1 Like

I didn’t quite track it down, but I suspect that the compiler has a hard time doing type-inference through those nested generator expressions?

Here is a variant that does no allocations (other than allocating `F`), and uses the element type of `A` to determine the element type of the output `F`:

``````function dft(A::AbstractArray{T,N}) where {T,N}
F = zeros(complex(float(T)), size(A)...)
@inbounds for k ∈ CartesianIndices(F), n ∈ CartesianIndices(A)
F[k] += cis(-2 * real(eltype(F))(π) * sum(d -> (k[d] - 1) * (n[d] - 1) / real(eltype(F))(size(A, d)), ntuple(identity, Val{N}()))) * A[n]
end
return F
end
``````

Note that the `2π` expression in your exponent will be computed in double precision by default, even if you want some other precision for `F`. To force it into the desired precision, use `real(eltype(F))(π)`; you have to do the same thing for the division by `size(A,n)` since otherwise division of two integers will default to double precision. You can also use `cis(x)` instead of `exp(im*x)`. I also used `ntuple(identity, Val{N}())` instead of `1:N`, which allows the compiler to unroll the inner loop by supplying the length at compile time.

5 Likes

Thanks for the reply, however, copy-paste your function shows a factor of 100 slowdown in speed and factor 1000 in allocations:

``````125.610 ms (2080001 allocations: 48.83 MiB)
``````

Have you checked it on your machine?

Had a typo that is now corrected.

Yeah, works now, thanks Ok, but unfortunately it does not come up with no allocations when using BigFloat… Than it is a lot slower then the original code…

``````julia> function dft(A::AbstractArray{T,N}, CT::Type = Float64) where {T,N}
F = similar(A, Complex{CT})
PI = real(eltype(F))(π)
@inbounds for k ∈ CartesianIndices(F)
F[k] = sum(cis(-2 * PI * sum((k[d] - 1) * (n[d] - 1) / size(A, d) for d ∈ 1:N)::Float64) * A[n] for n ∈ CartesianIndices(A))
end
return F
end
dft (generic function with 2 methods)

julia> function dft_new(A::AbstractArray{T,N}) where {T,N}
F = zeros(complex(float(T)), size(A)...)
@inbounds for k ∈ CartesianIndices(F), n ∈ CartesianIndices(A)
F[k] += cis(-2 * real(eltype(F))(π) * sum(d -> (k[d] - 1) * (n[d] - 1) / real(eltype(F))(size(A, d)), ntuple(identity, Val{N}()))) * A[n]
end
return F
end
dft_new (generic function with 1 method)

julia> using BenchmarkTools

julia> A = rand(Float64, 20, 20);

julia> @btime dft(\$A);
4.997 ms (4801 allocations: 218.88 KiB)

julia> @btime dft_new(\$A);
3.416 ms (1 allocation: 6.38 KiB)

julia> B = rand(BigFloat, 20, 20);

julia> @btime dft(\$B, BigFloat);
1.706 s (6058760 allocations: 240.00 MiB)

julia> @btime dft_new(\$B);
1.807 s (7976768 allocations: 342.43 MiB)
``````

Does `cispi` work for bigfloat? If so, it might be faster.

Unfortunately, we don’t have a `cispi` function yet: https://github.com/JuliaLang/julia/pull/35792. I hope we can get this into 1.6 though.

`BigFloat` operations allocate:

``````julia> x, y = big"1.0", big"2.0"
(1.0, 2.0)

julia> @btime \$x + \$y;
53.641 ns (2 allocations: 112 bytes)
``````

which is not surprising since a `BigFloat` is stored on the heap.

If you care about performance, why aren’t you using an FFT algorithm?

2 Likes

Basically, I would like to perform a test on accuracy of the Float64 fft results and I wanted to have the most simple solution. I do not have the time to implement a fft on my own.

Probably I need DoubleFloat or Float128 instead of BigFloat…

You could use this one, for example.

As far as I can see it, it works only for 1d arrays.

The simplest multi-dimensional FFT is just a sequence of 1d FFTs along each direction. So, for a 3d array, you would just need to call this with `mapslices` 3 times.

5 Likes