# Transforming an Array{Matrix{T, N}, M} to an Array{T, N+M}

Hi,

usually I had a lot of success discovering the various ways to transform the Array datatypes into each other but this one gives me some headaches: From a parallelized operation I end up with an array (with an arbitrary number of dimensions of arbitrary length) of matrices (could be arbitrary large, but will have the same size for each entry).

I now wish to “flatten”(?) said Array{Matrix{ComplexF64}} to an Array{ComplexF64} with the dimensions of the matrix added. This will be used to store this array in an HDF5 file.

Toy example:

``````dimensions = (2,3,4) # could have an arbitrary number of dimensions
rand_matrix() = rand(ComplexF64, 3,3) # dimension of each matrix is identical
# for each entry

# the result would actually be an Array{T, N} with N dimensions and an
# arbitrary type T, which I mapped to the result like
# result = map(x -> x.data, result)
# for the MWE I use
result = map(x->rand_matrix(), Array{Int}(undef,dimensions))

# Now I wish to transform this 2x3x4 Array{Vector{Matrix{ComplexF64}}, 3}
# to an Array{ComplexF64, 5}

# The following works but seems to be not very idiomatic ;)
transformed_result = Array{ComplexF64}(undef, size(result)..., size(first(result))...)
for i in Iterators.product(range.(1,size(result),step=1)...)
transformed_result[i...,:,:] = result[i...]
end
``````

What I don’t like about the solution:

• It assumes that there are two dimensions in each “entry”, i.e. a matrix, but what if I eventually started with an `Array{Array{T,3},4}`?
• It seems to be not very idiomatic. I especially feel that the use of `Iterators.product` is hiding the actually intended operation.

I feel that there could be an easy one-liner involving `hvcat`, but I can’t make it up.

Thanks for any ideas!

Sidenote: I could directly accumulate my results in the final data structure, but in my actual use-case I’m storing elements of a certain type `T` on which I also perform other operations later, i.e. by calling `f.(result)`

1 Like

Easiest I can think is to use the package EllipsisNotation.jl, then you could do:

``````julia> using EllipsisNotation

julia> for i in CartesianIndices(result)
transformed_result[i,..] .= result[i]
end
``````

Instead of the loop based on Iterators.product. This would work no matter the number of dimensions. I think this is the cleaner solution which is also very readable. One thing to consider is that no matter what method you chose, you will have to copy all the data to the new structure.

4 Likes

Here’s one idea, using an example where N=3, M=2:

``````julia> B = [1 2;;; 3 4;;; 5 6]   # This syntax requires Julia 1.7
1×2×3 Array{Int64, 3}:
[:, :, 1] =
1  2

[:, :, 2] =
3  4

[:, :, 3] =
5  6

julia> A = [B for _ in 1:2, _ in 1:3]
2×3 Matrix{Array{Int64, 3}}:
[1 2;;; 3 4;;; 5 6]  [1 2;;; 3 4;;; 5 6]  [1 2;;; 3 4;;; 5 6]
[1 2;;; 3 4;;; 5 6]  [1 2;;; 3 4;;; 5 6]  [1 2;;; 3 4;;; 5 6]

julia> reshape(cat(A..., dims=4), 1, 2, 3, 2, 3)
1×2×3×2×3 Array{Int64, 5}:
[:, :, 1, 1, 1] =
1  2

[:, :, 2, 1, 1] =
3  4
...
``````

Does this do what you want?

Edit: generic version of the last line:

``````reshape(cat(A..., dims=ndims(B)+1), size(B)..., size(A)...)
``````
1 Like

Another way is

``````transformed_result = reshape(reduce(vcat, (transpose(vec(r)) for r in vec(result))), size(result)..., size(first(result))...)
``````
1 Like

Unless `A` is quite small, it’s probably better to do `reduce((a,b)->cat(a,b;dims), A)` than `cat(A...;dims)`.

I think that will allocate and copy the data again and again for each element in `A`, so it depends:

`reduce` is better for large `A` and small `B` (elements of `A`):

``````julia> B = [1 2;;; 3 4;;; 5 6];

julia> A = [B for _ in 1:2000, _ in 1:3];

julia> @time cat(A..., dims=ndims(B)+1); # 2nd run
1.114491 seconds (71.02 k allocations: 552.791 MiB, 13.32% gc time)

julia> @time reduce((a,b)->cat(a,b;dims=ndims(B)+1), A); # 2nd run
0.295002 seconds (249.34 k allocations: 113.688 MiB)
``````

but it’s worse for small `A` and large `B`:

``````julia> B = repeat([1 2;;; 3 4;;; 5 6], outer=(1, 1, 100_000));

julia> A = [B for _ in 1:20, _ in 1:3];

julia> @time cat(A..., dims=ndims(B)+1); # 2nd run
0.603539 seconds (682 allocations: 274.713 MiB, 0.70% gc time)

julia> @time reduce((a,b)->cat(a,b;dims=ndims(B)+1), A); # 2nd run
15.546297 seconds (57.78 k allocations: 8.179 GiB, 1.48% gc time)
``````

I think the compiler could be improved to make `cat(A..., dims=ndims(B)+1)` efficient (in principle it should just need 1 allocation and 1 copy?)

For `reduce` I expect the inefficiency: there are specialized implementations of `reduce` for `hcat` and `vcat` but it would be difficult to do the same for a lambda…

2 Likes

That’s exactly what a function in `SplitApplyCombine.jl` does:

``````using SplitApplyCombine
combinedims(result)
# returns 3×3×2×3×4 Array{ComplexF64, 5}
``````

Don’t want to allocate another array and prefer a view instead? It has `combinedimsview()` for this purpose.

4 Likes

Thanks everyone for the plenty of input!

I’ll be marking @aplavin’s response as solution for mainly these reasons:

• The `combinedimsview()` is really what I’ve been looking for without really knowing, as I don’t need a full copy
• I think the `SplitApplyCombine` package is probably what someone else would like to find who’s got a similar problem
• I also have to confess that I had larger arrays in mind than I posted in my MWE (say `200x40x200` for the outer one). This renders the `cat` solutions more or less useless as I show below, even though I like them for their brevity and being in `Base`

The only thing I really dislike about this solution is that it is not in `Base` and therefore pulls in another dependency For the sake of completeness I wrote a small benchmark script that tests the various approaches (note that I switched everything to being CM order).

Summary
``````#%%
using Pkg
Pkg.activate("--temp")

#%%
using BenchmarkTools
using SplitApplyCombine
using EllipsisNotation

#%%
function setup_data(outer_dims = (2,3,4), inner_dims = (3,3))
rand_matrix() = rand(ComplexF64, inner_dims...)
map(x->rand_matrix(), Array{Int}(undef, outer_dims))
end

#%% Proposed ways
function combine_loop(A)
transformed_A = Array{ComplexF64}(undef, size(first(A))..., size(A)...)
for i in Iterators.product(range.(1,size(A),step=1)...)
transformed_A[:,:,i...] = A[i...]
end
transformed_A
end

function combine_loop_CartesianIndices(A)
transformed_A = Array{ComplexF64}(undef, size(first(A))..., size(A)...)
for i in CartesianIndices(A)
transformed_A[:,:,i] = A[i]
end
transformed_A
end

function combine_loop_EllipsisNotation(A)
transformed_A = Array{ComplexF64}(undef, size(first(A))...,size(A)...)
for i in CartesianIndices(A)
transformed_A[..,Tuple(i)...] = A[i]
end
transformed_A
end

function combine_cat(A)
reshape(cat(A..., dims=ndims(first(A))+1), size(first(A))..., size(A)...)
end

function combine_cat_reduce(A)
reshape(reduce((a,b)->cat(a,b;dims=ndims(first(A))+1), A), size(first(A))..., size(A)...)
end

function combine_SplitApplyCombine(A)
combinedims(A)
end

function combine_SplitApplyCombine_view(A)
combinedimsview(A)
end

#%% Verify that the are all idenctical
test_data = setup_data((2,3,4), (3,3))

test_functions = [combine_loop,
combine_loop_CartesianIndices,
combine_loop_EllipsisNotation,
combine_cat,
combine_cat_reduce,
combine_SplitApplyCombine,
combine_SplitApplyCombine_view]

individual_results = collect(f(test_data) for f in test_functions)

# check that all individual results are equal
@assert all(x -> x == individual_results, individual_results)

#%% Benchmark them
benchmark_small_only = [combine_cat] # <-- this needs to much memory on my system

small_dims = ((2,3,4), (3,3))
large_dims = ((200,40,200), (3,3))

benchmarks_small = []
benchmarks_large = []

for f in test_functions
@show f
push!(benchmarks_small, @benchmark \$f(data) setup=(data=setup_data(small_dims...)))
if !(f in benchmark_small_only)
push!(benchmarks_large, @benchmark \$f(data) setup=(data=setup_data(large_dims...)))
else
push!(benchmarks_large, missing)
end
end

#%% Print some info

function stringmean(trial)
t = mean(trial).time
allocs = mean(trial).allocs
memory = mean(trial).allocs
"\$(t) \$allocs"
end

println("Small Dimensions (\$small_dims)")
for (i, f) in enumerate(test_functions)
println("\n\$f:")
display(mean(benchmarks_small[i]))
end

println("\n\nLarge Dimensions (\$large_dims)")
for (i, f) in enumerate(test_functions)
println("\n\$f:")
if !ismissing(benchmarks_large[i])
display(mean(benchmarks_large[i]))
else
println("Too much memory / too slow.")
end
end
``````

which gives the following output on my moderately powered laptop

Summary
``````Small Dimensions (((2, 3, 4), (3, 3)))

combine_loop:
BenchmarkTools.TrialEstimate:
time:             5.074 μs
gctime:           321.659 ns (6.34%)
memory:           4.67 KiB
allocs:           26

combine_loop_CartesianIndices:
BenchmarkTools.TrialEstimate:
time:             5.779 μs
gctime:           377.396 ns (6.53%)
memory:           4.67 KiB
allocs:           26

combine_loop_EllipsisNotation:
BenchmarkTools.TrialEstimate:
time:             5.685 μs
gctime:           341.360 ns (6.00%)
memory:           4.67 KiB
allocs:           26

combine_cat:
BenchmarkTools.TrialEstimate:
time:             37.058 μs
gctime:           398.570 ns (1.08%)
memory:           16.19 KiB
allocs:           236

combine_cat_reduce:
BenchmarkTools.TrialEstimate:
time:             66.158 μs
gctime:           4.579 μs (6.92%)
memory:           57.84 KiB
allocs:           448

combine_SplitApplyCombine:
BenchmarkTools.TrialEstimate:
time:             2.176 μs
gctime:           189.162 ns (8.69%)
memory:           3.55 KiB
allocs:           2

combine_SplitApplyCombine_view:
BenchmarkTools.TrialEstimate:
time:             1.718 ns
gctime:           0.000 ns (0.00%)
memory:           0 bytes
allocs:           0

Large Dimensions (((200, 40, 200), (3, 3)))

combine_loop:
BenchmarkTools.TrialEstimate:
time:             559.914 ms
gctime:           185.278 ms (33.09%)
memory:           292.97 MiB
allocs:           1600003

combine_loop_CartesianIndices:
BenchmarkTools.TrialEstimate:
time:             587.806 ms
gctime:           201.019 ms (34.20%)
memory:           292.97 MiB
allocs:           1600003

combine_loop_EllipsisNotation:
BenchmarkTools.TrialEstimate:
time:             547.953 ms
gctime:           185.343 ms (33.82%)
memory:           292.97 MiB
allocs:           1600003

combine_cat:
Too much memory / too slow.

combine_cat_reduce:
BenchmarkTools.TrialEstimate:
time:             59.245 s
gctime:           11.443 s (19.32%)
memory:           87.26 GiB
allocs:           31786991

combine_SplitApplyCombine:
BenchmarkTools.TrialEstimate:
time:             550.919 ms
gctime:           36.190 ms (6.57%)
memory:           219.73 MiB
allocs:           3

combine_SplitApplyCombine_view:
BenchmarkTools.TrialEstimate:
time:             1.869 ns
gctime:           0.000 ns (0.00%)
memory:           0 bytes
allocs:           0
``````

My interpretation is:

• Use views with the `combinedimsview()` wherever possible as it is simply the fastest solution.
• Avoid `cat` and `reduce(cat)` in this case except for the smallest of arrays
• If you really need a copy and don’t want to pull in a package, there is nothing wrong with the looping version
5 Likes

`SplitApplyCombine` is a pretty stable, reliable and performant package - shouldn’t be an issue to depend on it.

I would even suggest everyone working with arrays/tables/data in general to become familiar with it. There are very useful functions complementing `Base` in these aspects: `combine/splitdims`, different `group`s and `join`s, etc. These functions are lightweight and general, work with different kinds of arrays and with many `Table` implementations. [ I’m not involved in SplitApplyCombine development at all, just a happy user (: ]

4 Likes

Agreed, SplitApplyCombine is the best solution.

Still, concatenating arrays is basic functionality, it feels like something that should be covered in Base. It’s bit a shame that `cat` doesn’t work well with a large number of arrays.

1 Like

First time trying the TensorCast.jl package from @mcabbott and wonder if it could be an appropriate alternative:

``````using TensorCast

# returns a view, 2×3×4×3×3 Array{ComplexF64, 5}
@cast transf_result[i,j,k,l,m] := result[i,j,k][l,m];

# returns a copy, 2×3×4×3×3 Array{ComplexF64, 5}
@cast transf_result[i,j,k,l,m] |= result[i,j,k][l,m];
``````
2 Likes