The more performant and clean way is in a package:
julia> using SplitApplyCombine
# if you need an actual materialized matrix:
julia> @btime combinedims($A, 1)
257.150 ns (2 allocations: 192 bytes)
# if a view of the original vector is fine:
julia> @btime combinedimsview($A, 1)
3.165 ns (0 allocations: 0 bytes)
# for comparison:
julia> @btime mapreduce(permutedims, vcat, $A)
390.875 ns (14 allocations: 1.03 KiB)
julia> @btime vcat($A'...)
885.152 ns (24 allocations: 768 bytes)
Very interesting. I have always liked the SAC package for the functions it provides and for the propensity for functional programming that I like so much.
could you explain specifically which functions / algorithms are used internally in SAC that make this transformation (conceptually relatively simple) much faster than the other proposals?
I don’t think SAC.jl does something special here. If you write a simple for-loop and allocate the whole resulting array upfront, performance will be the same. It’s the mapreduce(vcat) and vcat(A...) that are inefficient - the latter is fundamentally type-unstable.
As for combinedimsview, which is two orders of magnitude faster, it just doesn’t materialize the resulting array, providing a view instead.