GPU/CPU Agnostic FFT code

,

Julia has a FFTW.jl library that dispatch on cpu and gpu and no problem mixing CuArray programing and kernel programing, both will be agnostic, if you need something more than AcceleratedKernels.jl, KernelAbstraction.jl creates agnostic kernels for cpu and gpu. Also, CxxWrap.jl can help you port things that you don’t know how to translate but that won’t be agnostic. For instance in this code

using FFTW, CUDA, BenchmarkTools
julia> function try_FFT_on_cpu()
           values = rand(256, 256, 256)
           value_complex = ComplexF32.(values)
           cvalues = similar((value_complex), ComplexF32)
           copyto!(cvalues, values)
           cy = similar(cvalues)
           cF = plan_fft!(cvalues, flags=FFTW.MEASURE)
           @btime a = ($cF*$cy)
           return nothing
       end
try_FFT_on_cpu (generic function with 1 method)

julia> function try_FFT_on_cuda()
           values = rand(256, 256, 256)
           value_complex = ComplexF32.(values)
           cvalues = similar(cu(value_complex), ComplexF32)
           copyto!(cvalues, values)
           cy = similar(cvalues)
           cF = plan_fft!(cvalues)
           @btime CUDA.@sync a = ($cF*$cy)
           return nothing
       end
try_FFT_on_cuda (generic function with 1 method)

from Unreasonably fast FFT on CUDA - #8 by roflmaostc

2 Likes