My attempt at speeding up such operations by using unsafe views failed at the speed of pointer-derefs for non-bitstypes; see https://discourse.julialang.org/t/speed-and-type-stability-of-unsafe-pointer-to-objref/6478.
The code there gets rid of the spurious allocations for the views, but pays too much on pointer-derefs.
This is somewhat similar to the other approach to unsafe views, but I changed the array type instead of mucking with SubArray. The code by @rdeits and @tkoolen is faster because you don’t pay overhead on unsafe_load / unsafe_store!.