I am writing a code where I want to use a custom structure inside CUDA kernel. Following the CUDA.jl manual (https://cuda.juliagpu.org/stable/tutorials/custom_structs/), I need to write an Adapt.jl adapter for my structure. However, one of the fields of this structure is the Fourier transform FFTW.jl plan. Therefore, first, I have to write the adapter for this FFTW plan. The MWE can be the following:
using Adapt
using CUDA
using FFTW
abstract type ARCH{T} end
struct CPU{T} <: ARCH{T} end
struct GPU{T} <: ARCH{T} end
CPU() = CPU{Float64}()
GPU() = GPU{Float32}()
function Adapt.adapt_storage(::CPU{T}, p::FFTW.cFFTWPlan) where T
tmp = zeros(Complex{T}, p.sz)
return plan_fft!(tmp)
end
function Adapt.adapt_storage(::GPU{T}, p::FFTW.cFFTWPlan) where T
tmp = CUDA.zeros(Complex{T}, p.sz)
return plan_fft!(tmp)
end
E = zeros(ComplexF64, 128)
p = plan_fft!(E) # FFTW in-place forward plan for 128-element array of ComplexF64
pa = adapt(GPU(), p) # CUFFT in-place complex forward plan for 128-element CuArray of ComplexF32
This code works perfectly, but for each call of adapt_storage
it allocates tmp
array, which in my case can be very large. Therefore, I am searching a way to convert the FFTW plan using a low level definition of the plan structure.
The plan structure in FFTW.jl is defined here. For my adapter I would like to have a code similar to this one:
p = plan_fft!(E)
(; plan, sz, osz, istride, ostride, ialign, oalign, flags, region) = p
pa = FFTW.cFFTWPlan{ComplexF64, -1, true, 1, UnitRange{Int64}}(plan, sz, osz, istride, ostride, ialign, oalign, flags, region)
# It does not work!
Any ideas how I can do it?