I’m trying to build a tensor `A`

of size `3 x ℓ x m x n`

where `A[:, i, j, k]`

is a 3D coordinate at (i, j, k) in a uniform grid for [0, 1]^3.

On the CPU this code looks like this:

```
function mesh(n)
r = range(0.0f0, 1.0f0, length=n)
grid_tuples = [(x, y, z) for x in r, y in r, z in r]
grid = reshape(reinterpret(Float32, grid_tuples), 3, n, n, n)
return collect(grid)
end
```

I want to construct it on the GPU however to avoid copying gigabytes of data from CPU -> GPU. Is there a simple way to do this?

side note: Apparently this is not slow because of copying things to the GPU, but rather because of reinterpreted arrays being very slow.