CUDA fft wrapper problem

A 1d fft across the 2nd dimension of 3 dimensional CuArray is not enabled by the wrapper (ERROR: ArgumentError: batching dims must be sequential)

to reproduce:
dim = 2
data = CuArrays.rand(ComplexF32,512,512,512);
myFft = plan_fft!(data,dim);

the wrapper for cpu arrays allows this, and if dim is 1 or 3 it also works as expected for cuArrays

See this discussion: https://github.com/JuliaGPU/CUDA.jl/issues/119