CUDA.jl crashes if a 4d FFT is asked

Hi,

It seems that CUDA.jl and FFTW.jl do not work for Arrays with dimensions larger than 3:

julia> b = randn(8, 8, 8);
julia> cb = CuArray(b);
julia> fft(b);
julia> fft(cb);
julia> sum(fft(b) .- Array(fft(cb)))
3.319566843629218e-14 - 6.661338147750939e-15im
julia> b = randn(8, 8, 8, 8);
julia> cb = CuArray(b);
julia> fft(b);
julia> fft(cb);
ERROR: CUFFTError: user specified an invalid pointer or parameter (code 4, CUFFT_INVALID_VALUE)
Stacktrace:
  [1] throw_api_error(res::CUDA.CUFFT.cufftResult_t)
    @ CUDA.CUFFT ~/.julia/packages/CUDA/q3GG0/lib/cufft/libcufft.jl:11
  [2] macro expansion
    @ ~/.julia/packages/CUDA/q3GG0/lib/cufft/libcufft.jl:24 [inlined]
  [3] cufftMakePlanMany(plan::Int32, rank::Int64, n::Vector{Int32}, inembed::Ptr{Nothing}, istride::Int64, idist::Int64, onembed::Ptr{Nothing}, ostride::Int64, odist::Int64, type::CUDA.CUFFT.cufftType_t, batch::Int64, workSize::Base.RefValue{UInt64})
    @ CUDA.CUFFT ~/.julia/packages/CUDA/q3GG0/lib/utils/call.jl:26
  [4] cufftMakePlan(xtype::CUDA.CUFFT.cufftType_t, xdims::NTuple{4, Int64}, region::UnitRange{Int64})
    @ CUDA.CUFFT ~/.julia/packages/CUDA/q3GG0/lib/cufft/wrappers.jl:37
  [5] #133
    @ ~/.julia/packages/CUDA/q3GG0/lib/cufft/wrappers.jl:145 [inlined]
  [6] (::CUDA.APIUtils.var"#8#11"{CUDA.CUFFT.var"#133#134"{Tuple{CUDA.CUFFT.cufftType_t, NTuple{4, Int64}, UnitRange{Int64}}}, CUDA.APIUtils.HandleCache{Tuple{CuContext, CUDA.CUFFT.cufftType_t, Tuple{Vararg{Int64, N}} where N, Any}, Int32}, Tuple{CuContext, CUDA.CUFFT.cufftType_t, NTuple{4, Int64}, UnitRange{Int64}}})()
    @ CUDA.APIUtils ~/.julia/packages/CUDA/q3GG0/lib/utils/cache.jl:28
  [7] lock(f::CUDA.APIUtils.var"#8#11"{CUDA.CUFFT.var"#133#134"{Tuple{CUDA.CUFFT.cufftType_t, NTuple{4, Int64}, UnitRange{Int64}}}, CUDA.APIUtils.HandleCache{Tuple{CuContext, CUDA.CUFFT.cufftType_t, Tuple{Vararg{Int64, N}} where N, Any}, Int32}, Tuple{CuContext, CUDA.CUFFT.cufftType_t, NTuple{4, Int64}, UnitRange{Int64}}}, l::ReentrantLock)
    @ Base ./lock.jl:185
  [8] (::CUDA.APIUtils.var"#check_cache#9"{CUDA.APIUtils.HandleCache{Tuple{CuContext, CUDA.CUFFT.cufftType_t, Tuple{Vararg{Int64, N}} where N, Any}, Int32}, Tuple{CuContext, CUDA.CUFFT.cufftType_t, NTuple{4, Int64}, UnitRange{Int64}}})(f::CUDA.CUFFT.var"#133#134"{Tuple{CUDA.CUFFT.cufftType_t, NTuple{4, Int64}, UnitRange{Int64}}})
    @ CUDA.APIUtils ~/.julia/packages/CUDA/q3GG0/lib/utils/cache.jl:26
  [9] pop!(f::Function, cache::CUDA.APIUtils.HandleCache{Tuple{CuContext, CUDA.CUFFT.cufftType_t, Tuple{Vararg{Int64, N}} where N, Any}, Int32}, key::Tuple{CuContext, CUDA.CUFFT.cufftType_t, NTuple{4, Int64}, UnitRange{Int64}})
    @ CUDA.APIUtils ~/.julia/packages/CUDA/q3GG0/lib/utils/cache.jl:47
 [10] cufftGetPlan(::CUDA.CUFFT.cufftType_t, ::Vararg{Any})
    @ CUDA.CUFFT ~/.julia/packages/CUDA/q3GG0/lib/cufft/wrappers.jl:143
 [11] plan_fft(X::CuArray{ComplexF64, 4, CUDA.Mem.DeviceBuffer}, region::UnitRange{Int64})
    @ CUDA.CUFFT ~/.julia/packages/CUDA/q3GG0/lib/cufft/fft.jl:163
 [12] fft
    @ ~/.julia/packages/AbstractFFTs/0uOAT/src/definitions.jl:63 [inlined]
 [13] fft (repeats 2 times)
    @ ~/.julia/packages/CUDA/q3GG0/lib/cufft/fft.jl:124 [inlined]
 [14] top-level scope
    @ REPL[32]:1
 [15] top-level scope
    @ ~/.julia/packages/CUDA/q3GG0/src/initialization.jl:162

julia> 

Does anyone know if there is any way to have 4d FFT on the GPU’s?

Thanks

Note that this has nothing to do with FFTW — CUDA.jl has its own FFT implementation.

Ok, I see that CUDA.jl just calls NVIDIA’s CuFFT, and this only perform the FFT in 1,2 and 3 dimensions. I guess the easiest is to just FFT first the dimensions 1:2 and later the dimensions 3:4. Maybe something siilar should be done by default in CUDA.jl?

Thanks,

A