How can I convert a matrix of vectors to a 3d array?

Hello there,

I have a matrix of vectors. The type of the vactors are something non-default. I want to convert to a 3d array. I can see that similar questions have been asked for vectors of matrices, but those answers did not work for me.

Could you please say how I can do it, and explain the logic behind the solution, so I can generalize it to other dimensions in the future?

Thanks for your time.

One way would be:

julia> M = [rand(3) for i=1:2,j=1:4]
2Ă—4 Matrix{Vector{Float64}}:
 [0.456187, 0.302626, 0.00769471]  [0.217349, 0.599273, 0.922622]  …  [0.335926, 0.450889, 0.760988]
 [0.337041, 0.520466, 0.108746]    [0.244348, 0.393246, 0.185435]     [0.77141, 0.882466, 0.125375]

julia> [M[i,j][k] for i=1:2,j=1:4,k=1:3]
2Ă—4Ă—3 Array{Float64, 3}:
[:, :, 1] =
 0.456187  0.217349  0.738685  0.335926
 0.337041  0.244348  0.921115  0.77141

[:, :, 2] =
 0.302626  0.599273  0.633224  0.450889
 0.520466  0.393246  0.445012  0.882466

[:, :, 3] =
 0.00769471  0.922622  0.87023   0.760988
 0.108746    0.185435  0.498452  0.125375

Note that since the comprehension value is M[i,j][k], this expression can be changed simply to get the dimensions in different order.

Also, in terms of sizes in different dimensions, a more generic comprehension could be:

[M[i,j][k] for i=axes(M,1),j=axes(M,2),k=eachindex(M[1,1])]

or even:

[M[I][k] for I=CartesianIndices(M),k=eachindex(M[1,1])]

In terms of speed, putting the inner index as the first dimension is faster because of memory layout:

[M[I][k] for k=eachindex(M[1,1]),I=CartesianIndices(M)]
1 Like

Julia 1.9 will have a command for this, which works for any dimensionality. On earlier versions you can get it from Compat.jl:

julia> mv = [[a,b] for a in 1:3, b in 11:14]
3Ă—4 Matrix{Vector{Int64}}:
 [1, 11]  [1, 12]  [1, 13]  [1, 14]
 [2, 11]  [2, 12]  [2, 13]  [2, 14]
 [3, 11]  [3, 12]  [3, 13]  [3, 14]

julia> using Compat

julia> stack(mv)
2Ă—3Ă—4 Array{Int64, 3}:
[:, :, 1] =
  1   2   3
 11  11  11

[:, :, 2] =
  1   2   3
 12  12  12

[:, :, 3] =
  1   2   3
 13  13  13

[:, :, 4] =
  1   2   3
 14  14  14

julia> size(ans) == (size(first(mv))..., size(mv)...) 
true

julia> mt = [(a,b) for a in 1:3, b in 11:14]
3Ă—4 Matrix{Tuple{Int64, Int64}}:
 (1, 11)  (1, 12)  (1, 13)  (1, 14)
 (2, 11)  (2, 12)  (2, 13)  (2, 14)
 (3, 11)  (3, 12)  (3, 13)  (3, 14)

julia> stack(mv) == stack(mt)
true
2 Likes

Thanks a lot boys!

Another option for a pre-stack solution, you can use something like:

julia> z = reshape([1:3, 4:6, 7:9, 10:12], 2, 2)
2Ă—2 Matrix{UnitRange{Int64}}:
 1:3  7:9
 4:6  10:12

julia> reshape(reduce(hcat,z), 3, 2, 2)
3Ă—2Ă—2 Array{Int64, 3}:
[:, :, 1] =
 1  4
 2  5
 3  6

[:, :, 2] =
 7  10
 8  11
 9  12

But this is less flexible in the dimension ordering of the output, compared to a comprehension-based method.

Note that this reduce(hcat,z) does not hit the fast path, and hence calls hcat many times, which is O(N^2). You can do better by moving the reshapes around… but this illustrates why the magic fast path is a bit of a footgun:

julia> mv2 = [[a,b] for a in 1:30, b in 11:104];

julia> @time reduce(hcat, mv2);  # notice MB allocation
  0.002909 seconds (2.82 k allocations: 15.663 MiB)

julia> @time reduce(hcat, vec(mv2));  # uses the fast path
  0.000093 seconds (4 allocations: 44.188 KiB)

julia> @time stack(mv2);
  0.000064 seconds (2 allocations: 44.125 KiB)

julia> stack(mv2) == reshape(reduce(hcat, vec(mv2)), 2, 30, 94)
true

julia> @which reduce(hcat, mv2)  # slow pairwise hcat
reduce(op, A::AbstractArray; kw...)
     @ Base reducedim.jl:406

julia> @which reduce(hcat, vec(mv2))  # magic fast path
reduce(::typeof(hcat), A::AbstractVector{<:AbstractVecOrMat})
     @ Base abstractarray.jl:1709

Good catch. I hate those specializations. They have issues like this and are inconsistent with other reduction functions in Base. We have maximum for collections rather than reduce(max,...), so why not concat or concatenate or some other verb for concatenating elements of a collection?

That’s what stack is.

There might be room for verbs besides stack. IMO the logical meaning of concatenate is to keep every dimension in place, i.e. for a vector of matrices, concatenate vectors along the 1st dim of the container:

julia> concatenate(mv)
6Ă—4 Matrix{Int64}:
  1   1   1   1
 11  12  13  14
  2   2   2   2
 11  12  13  14
...

Another candidate is an eager cousin of Iterators.flatten:

julia> flatten(mv)
24-element Vector{Int64}:
  1
 11
  2
 11
  3
...
1 Like

But stack is less flexible and does all sorts of reshaping along the way.

julia> z = [1:3 for _ in 1:10];

julia> stack(z) # fine, probably a reasonable thing to do
3Ă—10 Matrix{Int64}:
 1  1  1  1  1  1  1  1  1  1
 2  2  2  2  2  2  2  2  2  2
 3  3  3  3  3  3  3  3  3  3

julia> stack(z; dims=1) # transposes each element?
10Ă—3 Matrix{Int64}:
 1  2  3
 1  2  3
 1  2  3
 1  2  3
 1  2  3
 1  2  3
 1  2  3
 1  2  3
 1  2  3
 1  2  3

julia> stack(z; dims=3)
ERROR: ArgumentError: cannot stack slices ndims(x) = 1 along dims = 3

julia> @btime cat(($z)...; dims=3);
  8.467 ÎĽs (266 allocations: 10.59 KiB)

How do I concatenate the vectors of z along the third dimension to produce a 3x1x10 array? My best builtin option is cat, but it has horrible performance. My actual best option is this particular case is a reduce(hcat,z), reduce(vcat,z), or stack(z) followed by a reshape, but reshape games are tedious and annoying to write and awful to read.

I suppose that really this is showing that stack should (maybe?) be modified to work along arbitrary dimensions, even those beyond the trailing dimension.

Also, stack always inserts a dimension whereas cat does not. You can do reshape gymnastics to make all this work, but an iterable-based cat wouldn’t require those things.

[z... ;;;]

Which lowers to

hvncat(3, z...)

julia> @btime [z... ;;;]
  3.250 ÎĽs (55 allocations: 2.81 KiB)

EDIT: Though curious this is a bit worse than if z is an array of arrays vs. ranges; probably some performance gains to be made there: 2.489 ÎĽs (13 allocations: 1.11 KiB)

The reason this is forbidden is type-stability. Unlike cat(...; dims) the return type of stack does not depend on the keyword.

Allowing stack(...; dims=Val(3)) would be the smallest extension. Allowing stack(...; dims=(3, 2)) might be another way, matching eachslice(ones(3,1,10); dims=(3,2)). The issue about this is 47794.

Well, stack usually preserves all dimensions – axes(result) is the union of inner & outer. Both the inner elements and the outer container have a 1st axis, and they cannot both end up first. This is rule has very simple indexing, A[o...][i...] == stack(A)[i..., o...]. (I say “usually” becuase with a keyword, A[o][i...] = stack(A; dims=1)[o, i...], which with multi-dimensional A has axes(vec(A)) instead.)

How do I concatenate the vectors of z along the third dimension to produce a 3x1x10 array? … reshape gymnastics to make all this work, but an iterable-based cat wouldn’t require those things.

There are quite a few possible rules for functions combining arrays of arrays into one. Different people seem to think different ways are the only natural extension, and are surprised when the computer does not read their mind. What rules do you see this function obeying, on an M-array of N-arrays? And does it fit with cat([1 2], [3 4], dims=(1,2))?

I think the most obvious concatenate function would never move an axis, and hence would combine the two 1st axes here. That’s one way to generalise the other existing magic fast path, reduce(vcat, z), but won’t do what you ask.

1 Like

Are there any subtle differences in behavior between Base.stack and SplitApplyCombine.combinedims to be mindful of? At first glance they seem to produce the same results. The former seems to benchmark slightly faster, and since it doesn’t require an additional import, I will probably start using it instead of the latter. (Though the reverse function SplitApplyCombine.splitdims is often very handy, together the “view” versions of the two aforementioned functions.)

julia> using SplitApplyCombine

julia> mv = [[a,b] for a in 1:3, b in 11:14];

julia> z = [1:3 for _ in 1:10];

julia> stack(mv) == combinedims(mv)
true

julia> stack(z) == combinedims(z)
true

julia> stack(z; dims=1) == combinedims(z, 1)
true

I’d settle for a clone of cat that operates on an iterable rather than vararg. Although there’s also room for a “blockconcat” function that concatenates the inner objects along the outer dimensions (i.e., a AxB array of CxDxE arrays becomes a ACxBDxE array, or otherwise works so long the inner objects form a valid grid).

We certainly won’t cover every concatenation paradigm an adversary could conceive, but currently we’re missing (what I would consider) some of the most common ones. Rather, in many cases we have them but only in vararg form.

reduce(hcat,...) and friends are bad idioms because 1) they are intuitively (i.e., without knowledge of the specializations) the worst way to perform this operation and 2) the specializations miss many cases (like my earlier recommendation that somebody pointed out missed the fastpath) so you go from “best” to “worst” performance unexpectedly.