CUDA.randn! allocation

CURAND requires a power-of-2 length array for randn!, so we need to allocate a temporary buffer.