# Most efficient way to stack vectors on the fourth dimension

let `x=[1 2; 3 4]` and `y=copy(x)` I want to stack the two arrays together on the 4th dimension.

E.g. `xy = cat(x, y, dims=4)` will get what I wanted

However the above is fine but when you apply the same technique to a large data such as the MNIST dataset the speed is really slow e.g.

``````using Flux, Flux.Data.MNIST
imgs = MNIST.images()
X = cat(float.(imgs)..., dims = 4) # SLOW!!!
``````

What’s the most efficient code to create `X`? (which is done by stacking 60,000 28x28 matrices together on the 4th dimension.)

I assume you want to stack 3-dimensional arrays (and not matrices) along `dims=4`.

I would permute dimensions (`permutedims`), flatten (`vec`), then use `reduce(hcat, ...)` which is optimized, and finally `reshape` and `permutedims` back.

``````function my_cat(X)
k = length(X)
k == 0 && return nothing

s = size(X[1])
T = eltype(X[1])
@assert length(s) == 2
@assert all(x -> s == size(x), X)
@assert all(x -> T == eltype(x), X)

n = prod(s)
Y_vec = Vector{T}(undef, k*n)
offset = 0
for x in X
@inbounds Y_vec[offset + 1: offset + length(x)] .= vec(x)
offset += length(x)
end
Y = reshape(Y_vec, size(X[1])..., 1, k)
return Y
end

x = [1 2; 3 4];
y = copy(x);
X = [x, y];
my_cat(X)
``````
1 Like

Correct. Because it’s image data with height width and colour channels; that makes 3 dims. But the data is just grey images so the colourchannel dim disappears

I made a package for this:

``````@cast stacked[i,j,_,n] := xlist[n][i,j] |> float
``````

Assuming `xlist = [x,y]` are your images, and you really did mean 4th dimension not 3rd.

It’s just doing something like `reduce(hcat,…)`, but the point is not to have to add a comment `# axes are row,col,colour,number` or a docstring to the function you write.

Edit: actually that’s quite slow on all 60000. But this is fast: `lazy` uses RecursiveArrayTools to make a lazy array, instead of `reduce(cat,...)`.

``````@cast stacked[i,j,_,n] := float( imgs[n][i,j] )  lazy;
``````

(You could write `|=` rather than `:=` to indicate that you want a copy at the end, but in fact broadcasting `float` has this effect here anyway.)

2 Likes

Looks really good! This is such a great idea! Manipulating high dimensional arrays (tensors) need really good intuitive abstractions!

It would be good to include installation instructions using `Pkg.add` method as I am using juliabox.com and so it’s not as convenient to assess the REPL.

A solution inspired by your description

``````using Flux, Flux.Data.MNIST
imgs = MNIST.images()
# X = cat(float.(imgs)..., dims = 4) # SLOW!!!

x = float.(imgs)
x_stacked = reshape(reduce(hcat, vec.(x)), 28, 28, 1, 60_000)

# all checks out!!!
all([all(x[k] .== x_stacked[:,:,:,k]) for k = 1:60_000]) # true
``````

@mohamed82008 I appreciate your efforts but I like my one-liner better already

1 Like

Ah will try to figure this out, hadn’t looked in ages. I can clone but have difficulty installing. The package should be registered in a few days, then perhaps easier.

This indeed seems good. Interesting that `mapreduce(vec, hcat, x)` does not hit the same optimisation, and is very slow.

No worries. I like your solution too. Just notice that the above line allocates unnecessarily. You can use a lazy map here from https://github.com/JuliaArrays/MappedArrays.jl.

Edit: `mapreduce` will do the same but it seems that @mcabbott already commented on that.

1 Like