Julia has a FFTW.jl library that dispatch on cpu and gpu and no problem mixing CuArray programing and kernel programing, both will be agnostic, if you need something more than AcceleratedKernels.jl, KernelAbstraction.jl creates agnostic kernels for cpu and gpu. Also, CxxWrap.jl can help you port things that you don’t know how to translate but that won’t be agnostic. For instance in this code
using FFTW, CUDA, BenchmarkTools
julia> function try_FFT_on_cpu()
values = rand(256, 256, 256)
value_complex = ComplexF32.(values)
cvalues = similar((value_complex), ComplexF32)
copyto!(cvalues, values)
cy = similar(cvalues)
cF = plan_fft!(cvalues, flags=FFTW.MEASURE)
@btime a = ($cF*$cy)
return nothing
end
try_FFT_on_cpu (generic function with 1 method)
julia> function try_FFT_on_cuda()
values = rand(256, 256, 256)
value_complex = ComplexF32.(values)
cvalues = similar(cu(value_complex), ComplexF32)
copyto!(cvalues, values)
cy = similar(cvalues)
cF = plan_fft!(cvalues)
@btime CUDA.@sync a = ($cF*$cy)
return nothing
end
try_FFT_on_cuda (generic function with 1 method)