CUDNN operators not supporting Views/SubArrays

Is there a reason for Convolution (and other CUDNN related functions) to be restricted to DenseCuArray as found in the NNlibCUDA.jl repo?

For example, taking a view here will result in an error as CUDA has to resort to CPU as the Conv fucntion doesn’t support the view on the CuArray:

using CUDA
using Flux

m = Conv((1,), 3 => 4) |> gpu

x1 = CUDA.rand(1,3,8);
m(x1); # works fine

x2 = CUDA.rand(1,4,8);
x3 = view(x2, :, 1:3, :);
ERROR: TaskFailedException

    nested task error: Scalar indexing is disallowed.

Performing a Dense operator on a view is working fine however (as Dense function supports AbstractArray input and doesn’t rely on CUDNN).

The actual use case is that I’m using a custom dataloader in which the full dataset is stored as a CuArray and I’d like it to provide the data batches as views to avoid allocations. Are CUDNN operators inherently forbidding such optimization?

Does it work relaxing to AnyCuArray?

Yes, by switching the cudnn/conv.jl functions to AnyCuArray instead of DenseCuArray in CUDNN.jl, it produces the same output.

I’m however quite unsafe whether there’s some CUDNN trap I’m overlooking by doing so.
Would it makes sense that I open a PR with the conversion to AnyCuArray? I guess @denizyuret would be aware if there are contraindications to such relaxation?

One potential problem could be views that are not contiguous: I am fairly sure the low level cudnn functions would not like these. If I remember correctly CUDNN tensors can have strides but not all functions support them. We could test with a PR that uses AnyCuArray and see the limits.