GPU/CPU Agnostic FFT code

yolhan_mannes · February 25, 2025, 10:55am

Julia has a FFTW.jl library that dispatch on cpu and gpu and no problem mixing CuArray programing and kernel programing, both will be agnostic, if you need something more than AcceleratedKernels.jl, KernelAbstraction.jl creates agnostic kernels for cpu and gpu. Also, CxxWrap.jl can help you port things that you don’t know how to translate but that won’t be agnostic. For instance in this code

using FFTW, CUDA, BenchmarkTools
julia> function try_FFT_on_cpu()
           values = rand(256, 256, 256)
           value_complex = ComplexF32.(values)
           cvalues = similar((value_complex), ComplexF32)
           copyto!(cvalues, values)
           cy = similar(cvalues)
           cF = plan_fft!(cvalues, flags=FFTW.MEASURE)
           @btime a = ($cF*$cy)
           return nothing
       end
try_FFT_on_cpu (generic function with 1 method)

julia> function try_FFT_on_cuda()
           values = rand(256, 256, 256)
           value_complex = ComplexF32.(values)
           cvalues = similar(cu(value_complex), ComplexF32)
           copyto!(cvalues, values)
           cy = similar(cvalues)
           cF = plan_fft!(cvalues)
           @btime CUDA.@sync a = ($cF*$cy)
           return nothing
       end
try_FFT_on_cuda (generic function with 1 method)

from Unreasonably fast FFT on CUDA - #8 by roflmaostc

Topic		Replies	Views
CuPy CuFFT ~2x faster than CUDA.jl CuFFT GPU performance , cuda , fft	15	2827	February 27, 2023
[ANN] RustFFT.jl: Compute forward and inverse FFTs with RustFFT Package Announcements	13	1531	May 30, 2023
Generic Julia FFT Numerics	1	844	July 10, 2017
Why fft with MEASURE plan 10x slower than calling fft directly with CUDA.CUFFT? Performance gpu , cuda	7	171	September 22, 2024
Plan FFTs both in julia 0.6.3 and 0.7 General Usage fftw	2	1243	June 17, 2018

GPU/CPU Agnostic FFT code

Related topics