Chebyshev Transform using FFTW and FastTransforms.jl Speed

I’m looking into spectral methods for some PDEs.

For the discrete Chebyshev Transform, it is basically a discrete cosine transform:

I’ve looked at FastTransforms.jl as they have a Chebyshev Transform which can be used with FastQuadrature.jl for many spectral methods. However, the FastQuadrature.jl doesn’t have Gauss-Lobatto points (which includes the end points of the domain). The Guass-lobatto points are also known as Chebyshev points of the Second Kind according to FastTransforms.jl. The regular Gauss points are Chebyshev of the First Kind.

The FastTransforms.jl is extremely slow in comparison to doing one’s own FFTW Chebyshev Transform. However, for the second kind Gauss-Lobatto points, the transforms seem to differ by a little bit and I’m not sure why?

Here’s what I got:

using FFTW
using FastTransforms.jl

#1st kind Gauss Points
#Gauss Points make Chebyshev Transform into a DCT Type-II
function FastChebyshevTransform1(g::Array{Float64}, plan)
    n = length(g)
    ftran = plan*g #FFTW with plan DCT Type-II 
    @. ftran = sqrt(2.0/n)*ftran; #normalization constant
    ftran[1] = ftran[1]/sqrt(2.0) #Normalization from definition for 1st element
    return ftran

end

#2nd kind gauss lobatto points
#Gauss-Lobatto points change Chebyshev Transform to DCT Type-I
function FastChebyshevTransform2(g::Array{Float64}, plan)
    ftran  = plan*g  #FFTW with plan DCT Type-I
    @. ftran = 1.0/n* ftran; #normalization constant
    ftran[1] = ftran[1]/2.0 #Normalization from definition for 1st element
    ftran[end] = ftran[end]/2.0 #Normalization from definition for last element
    return ftran

end

To test use function exp().

Gauss Points 1st Kind

p = FastTransforms.chebyshevpoints(Float64,n, Val(1)) #1st Guass Points
g =  exp.(p)
pcosp = FFTW.plan_dct(g, flags=FFTW.PATIENT); # DCT Type-II
pr0 = FFTW.plan_r2r(f, FFTW.REDFT00,flags=FFTW.MEASURE); #DCT Type-I

For the FastTransform.jl

@btime FastTransforms.chebyshevtransform(g, Val(1));

Output:
9.376 μs (23 allocations: 1.09 KiB)

And FFTW:
@btime FastChebyshevTransform1(g, pcosp);
Output:
605.680 ns (7 allocations: 464 bytes)

For the Gauss-Lobatto 2nd Kind:

p = FastTransforms.chebyshevpoints(Float64,n, Val(2)) #1st Guass Points
g =  exp.(p)

FastTransforms:
@btime chebyshevtransform(g, Val(2));
Output:
125.223 μs (168 allocations: 11.05 KiB)

FFTW:
@btime FastChebyshevTransform2(g, pr0);
Output:
16.896 μs (133 allocations: 9.73 KiB)


using FFTW seems to be much faster.

The 1st kind Gauss Points for both functions are identical. However, the 2nd kind Gauss-Lobatto Points Transforms are not exactly the same as shown below:

FastTransforms:

FastTransforms.chebyshevtransform(g, Val(2))
20-element Vector{Float64}:
  1.2660658777520084
  1.1303182079849698
  0.2714953395340765
  0.044336849848663804
  0.00547424044209372
  0.0005429263119139379
  4.497732295426474e-5
  3.1984364623956356e-6
  1.992124806361816e-7
  1.1036771676242671e-8
  5.505895624045487e-10
  2.4979566338996256e-11
  1.039185688854928e-12
  3.994426203361931e-14
  1.429092589880367e-15
  3.7981314000334304e-17
  1.0225738384705389e-16
  1.4023869784738819e-16
  9.349246523159212e-17
 -9.349246523159212e-17

FFTW function:

FastChebyshevTransform2(g, pr0)
20-element Vector{Float64}:
  1.2027625838644083
  1.0738022975857213
  0.2579205725573727
  0.042120007356230615
  0.005200528419989034
  0.000515779996318241
  4.2728456806551503e-5
  3.038514639275854e-6
  1.8925185660437253e-7
  1.0484933092430537e-8
  5.230600842843212e-10
  2.3730588022046446e-11
  9.872264044121816e-13
  3.794704893193834e-14
  1.3576379603863487e-15
  3.608224830031759e-17
  9.71445146547012e-17
  1.332267629550188e-16
  8.881784197001253e-17
 -8.881784197001253e-17

So not sure where the issue is between the two.

EDIT: Solution for the discrepancies in the Chebyshev Transforms with Gauss-Lobatto 2nd Kind is normalization factor which can be seen in the corrected function below:

#2nd kind gauss lobatto points
#Gauss-Lobatto points change Chebyshev Transform to DCT Type-I
function FastChebyshevTransform2(g::Array{Float64}, plan)
    ftran  = plan*g  #FFTW with plan DCT Type-I
    @. ftran = 1.0/(n-1.0)* ftran; #normalization constant
    ftran[1] = ftran[1]/2.0 #Normalization from definition for 1st element
    ftran[end] = ftran[end]/2.0 #Normalization from definition for last element
    return ftran

end

Usually it’s a matter of looking carefully at FFTW’s definition of the transform and comparing the normalization to what you want (it may differ in the first and last points).

See, for example, how FastChebInterp.jl computes the first-kind Chebyshev coefficients.

@stevengj Ah you’re right!
The normalization for DCT Type-I is:
1/(n-1)

I also didn’t know of FastChebInterp.jl. I see that it’s also using FFTW so I’m guessing it would be just as fast as the functions I wrote?

Tried FastChebInterp.jl

@btime FastChebInterp.chebcoefs(g);
Output:
120.330 μs (131 allocations: 10.02 KiB)

Which is similar to FastTransforms.jl

If you need to compute interpolants over and over, you can do better by precomputing the plan, preallocating, etc. (Also, FastChebInterp is only type I.)

It’s probably just the overhead of not precomputing the plan, unlike your code. I guess FastTransforms could use an API that precomputes the transform data, along with a mul! API to apply it in-place to precomputed output.

FastChebInterp is mainly designed for interpolation, in which case you usually compute the interpolant once and then re-use it many times, so a bit of (constant-factor) overhead in creating the interpolant is not a big concern.

FastTransforms recomputes the plan for each transform if you call chebyshevtransform(g, Val(2)) directly. To compare the runtimes, you should precompute a Chebyshev plan and only measure the time taken by the transform, which should be comparable. Computing the plan can be expensive.

@jishnub I don’t see any documentation in FastTransforms.jl on pre-computing plans for transforms like the Chebyshev. How would one go about doing that?