How to copy view of CuArray to Array efficiently?

If your view is non-contiguous, it will be squashed into a contiguous CuArray first before copying to the host Array using an API call. Alternatives are possible, such as performing multiple API calls to copy each contiguous slice, or by using a CuArray representing the host array (e.g., using unsafe_wrap(CuArray, ::Array)) and performing a broadcast assignment. None of these are guaranteed to improve performance in all cases though, so we default to the simplest solution, which is to allocate a temporary CuArray.