Simple FFT project performance

Hello!

I am trying to do a simple FFT library just for fun, and not really for the highest performance. I am trying to write the code very transparently. If you want to look…

(I can tell you that reverse engineering AbstractFFTs.jl wasn’t so fun, actually. However, I think I implement it correctly now.)

I am finding certain critical things are slow:

  1. Allocation of the output buffer, which may be a different size than the input buffer.
# MinimalFFT.jl L228
y = Array{P.D,N}(undef, P.os)
  1. The dispatch routine in execute_plan in planner.jl

I can guarantee that the function references are valid, but apparently there is not a way to tell this to the compiler, need something similar to @inbounds?

  1. I think I want a way to tell the compiler that certain vectors do not aliase in memory for performance. This would be useful in stockham.jl.

  2. The colon operator is slow, so I implemented my own indexer.jl to do it.

So I am usually within a factor of 2-10 of FFTW performance as shown in profile. I think this shows that FFTW is pretty good, actually.

Thanks for any advice.

Edit: I had a bad bug in my earlier commit this evening. Corrected now! Sorry.

Alex G

1 Like

I think by creating a dictionary of functions you’re really giving Julia a hard time specialising the function being call, can you replace it with a function that does a bunch of if else to choose which one to call and see ?

function fake_dict(i,args....)
 if i==1
    return f1(args....)
...

Also, for now its really hard to help you without a proper function to call and profile with no test inside we would need just computations and from this we can profile and help, I tried to clone and search but its very unpleasant for now, also you have nothing in your test environment you need to ] activate ./test and add everything so that you can run ]activate . and ] test

I can’t test the code for now so I will just point oit some simple things that you can optimize
First there is your structure MinimalPlan
You dit this

mutable struct MinimalPlan{T} <: Plan{T}
    D::DataType # destination type, for real fft     # required by AbstractFFTs
    n::Tuple{Vararg{Int64}} # Size of the FFT input     # required by AbstractFFTs
    region::Union{Int,UnitRange{Int64}}     # required by AbstractFFTs
    flags::Int32 # bit vector of fft type
    os::Tuple{Vararg{Int64}} # output size
    ipd::Dict{Int64,Vector{inner_plan}} # region -> inner_plan

    pinv::ScaledPlan # required by AbstractFFTs
end

First of all, you can make D and other static informations (like the Output size) parametric types
This way you avoid abstract fields and julia can optimize access to them

Next you said this

If the type of the elements were known at compile time, I think this may be faster.
And you can also make an in-place version to totally get rid of the allocation cost

For this you can make functors

(p::inner_plan)(args...) = nothing

They are basically callable objects
Yeah if the functions aren’t too similar this would need some efforts to abstract everything correctly.

In your code, more precisely in the function execute_plan in planner.jl
You do this

        ipv = P.ipd[r]
        lf = length(ipv)
        ipv1 = ipv[1]
        fun1 = ipv1.fun
        ns1 = ipv1.ns
        exp1 = ipv1.exp

Event though you made inner_plan{F<:Function}, julia can’t deduce the parametric type in that code (so julia cant know what the function is).
Why julia can’t do that ?
It’s simple, it doesn’t know the type of ipv
Since inner_plan is a parametric type, this make inner_plan without specifying the parameters an abstract type so when you do ipd::Dict{Int64,Vector{inner_plan}}, Vector{inner_plan} can’t be correctly optimized because it’s parameter is not a concrete type.
So when you acces one element in such a vector, Julia can’t infer the type of the element you accessed (which can cause some slow down and allocations).

If i find some other way to help I will write here