KernelAbstraction ndrange using UnitRange

I have an array with size(A)=(N,M) and I would like to run a kernel over ndrange=(2:N-1,3:M-1). Is there a way to do this other than defining an OffsetArray wrapper?

Use I = CartesianIndices(ndrange) and do for i in I, or A[I] = ..., @view A[I], etcetera? Not sure about CUDA support, though.

For future reference, what we ended up doing was adding the starting index of the range within the kernel. So if we want to iterate over R::CartesianIndices

@kernel function kern(A,@Const(I0))
  I = @index(Global,Cartesian)
  I += I0
  A[I] = func(I)
end
kern(backend,64)(A,R[1]-oneunit(R[1]),ndrange=size(R))

Which seems to work on general backends without a performance penalty.