Thanks everyone for the plenty of input!
I’ll be marking @aplavin’s response as solution for mainly these reasons:
- The
combinedimsview()
is really what I’ve been looking for without really knowing, as I don’t need a full copy
- I think the
SplitApplyCombine
package is probably what someone else would like to find who’s got a similar problem
- I also have to confess that I had larger arrays in mind than I posted in my MWE (say
200x40x200
for the outer one). This renders the cat
solutions more or less useless as I show below, even though I like them for their brevity and being in Base
The only thing I really dislike about this solution is that it is not in Base
and therefore pulls in another dependency
For the sake of completeness I wrote a small benchmark script that tests the various approaches (note that I switched everything to being CM order).
Summary
#%%
using Pkg
Pkg.activate("--temp")
Pkg.add("BenchmarkTools")
Pkg.add("SplitApplyCombine")
Pkg.add("EllipsisNotation")
#%%
using BenchmarkTools
using SplitApplyCombine
using EllipsisNotation
#%%
function setup_data(outer_dims = (2,3,4), inner_dims = (3,3))
rand_matrix() = rand(ComplexF64, inner_dims...)
map(x->rand_matrix(), Array{Int}(undef, outer_dims))
end
#%% Proposed ways
function combine_loop(A)
transformed_A = Array{ComplexF64}(undef, size(first(A))..., size(A)...)
for i in Iterators.product(range.(1,size(A),step=1)...)
transformed_A[:,:,i...] = A[i...]
end
transformed_A
end
function combine_loop_CartesianIndices(A)
transformed_A = Array{ComplexF64}(undef, size(first(A))..., size(A)...)
for i in CartesianIndices(A)
transformed_A[:,:,i] = A[i]
end
transformed_A
end
function combine_loop_EllipsisNotation(A)
transformed_A = Array{ComplexF64}(undef, size(first(A))...,size(A)...)
for i in CartesianIndices(A)
transformed_A[..,Tuple(i)...] = A[i]
end
transformed_A
end
function combine_cat(A)
reshape(cat(A..., dims=ndims(first(A))+1), size(first(A))..., size(A)...)
end
function combine_cat_reduce(A)
reshape(reduce((a,b)->cat(a,b;dims=ndims(first(A))+1), A), size(first(A))..., size(A)...)
end
function combine_SplitApplyCombine(A)
combinedims(A)
end
function combine_SplitApplyCombine_view(A)
combinedimsview(A)
end
#%% Verify that the are all idenctical
test_data = setup_data((2,3,4), (3,3))
test_functions = [combine_loop,
combine_loop_CartesianIndices,
combine_loop_EllipsisNotation,
combine_cat,
combine_cat_reduce,
combine_SplitApplyCombine,
combine_SplitApplyCombine_view]
individual_results = collect(f(test_data) for f in test_functions)
# check that all individual results are equal
@assert all(x -> x == individual_results[1], individual_results)
#%% Benchmark them
benchmark_small_only = [combine_cat] # <-- this needs to much memory on my system
small_dims = ((2,3,4), (3,3))
large_dims = ((200,40,200), (3,3))
benchmarks_small = []
benchmarks_large = []
for f in test_functions
@show f
push!(benchmarks_small, @benchmark $f(data) setup=(data=setup_data(small_dims...)))
if !(f in benchmark_small_only)
push!(benchmarks_large, @benchmark $f(data) setup=(data=setup_data(large_dims...)))
else
push!(benchmarks_large, missing)
end
end
#%% Print some info
function stringmean(trial)
t = mean(trial).time
allocs = mean(trial).allocs
memory = mean(trial).allocs
"$(t) $allocs"
end
println("Small Dimensions ($small_dims)")
for (i, f) in enumerate(test_functions)
println("\n$f:")
display(mean(benchmarks_small[i]))
end
println("\n\nLarge Dimensions ($large_dims)")
for (i, f) in enumerate(test_functions)
println("\n$f:")
if !ismissing(benchmarks_large[i])
display(mean(benchmarks_large[i]))
else
println("Too much memory / too slow.")
end
end
which gives the following output on my moderately powered laptop
Summary
Small Dimensions (((2, 3, 4), (3, 3)))
combine_loop:
BenchmarkTools.TrialEstimate:
time: 5.074 μs
gctime: 321.659 ns (6.34%)
memory: 4.67 KiB
allocs: 26
combine_loop_CartesianIndices:
BenchmarkTools.TrialEstimate:
time: 5.779 μs
gctime: 377.396 ns (6.53%)
memory: 4.67 KiB
allocs: 26
combine_loop_EllipsisNotation:
BenchmarkTools.TrialEstimate:
time: 5.685 μs
gctime: 341.360 ns (6.00%)
memory: 4.67 KiB
allocs: 26
combine_cat:
BenchmarkTools.TrialEstimate:
time: 37.058 μs
gctime: 398.570 ns (1.08%)
memory: 16.19 KiB
allocs: 236
combine_cat_reduce:
BenchmarkTools.TrialEstimate:
time: 66.158 μs
gctime: 4.579 μs (6.92%)
memory: 57.84 KiB
allocs: 448
combine_SplitApplyCombine:
BenchmarkTools.TrialEstimate:
time: 2.176 μs
gctime: 189.162 ns (8.69%)
memory: 3.55 KiB
allocs: 2
combine_SplitApplyCombine_view:
BenchmarkTools.TrialEstimate:
time: 1.718 ns
gctime: 0.000 ns (0.00%)
memory: 0 bytes
allocs: 0
Large Dimensions (((200, 40, 200), (3, 3)))
combine_loop:
BenchmarkTools.TrialEstimate:
time: 559.914 ms
gctime: 185.278 ms (33.09%)
memory: 292.97 MiB
allocs: 1600003
combine_loop_CartesianIndices:
BenchmarkTools.TrialEstimate:
time: 587.806 ms
gctime: 201.019 ms (34.20%)
memory: 292.97 MiB
allocs: 1600003
combine_loop_EllipsisNotation:
BenchmarkTools.TrialEstimate:
time: 547.953 ms
gctime: 185.343 ms (33.82%)
memory: 292.97 MiB
allocs: 1600003
combine_cat:
Too much memory / too slow.
combine_cat_reduce:
BenchmarkTools.TrialEstimate:
time: 59.245 s
gctime: 11.443 s (19.32%)
memory: 87.26 GiB
allocs: 31786991
combine_SplitApplyCombine:
BenchmarkTools.TrialEstimate:
time: 550.919 ms
gctime: 36.190 ms (6.57%)
memory: 219.73 MiB
allocs: 3
combine_SplitApplyCombine_view:
BenchmarkTools.TrialEstimate:
time: 1.869 ns
gctime: 0.000 ns (0.00%)
memory: 0 bytes
allocs: 0
My interpretation is:
- Use views with the
combinedimsview()
wherever possible as it is simply the fastest solution.
- Avoid
cat
and reduce(cat)
in this case except for the smallest of arrays
- If you really need a copy and don’t want to pull in a package, there is nothing wrong with the looping version