This reply to another topic mentions that reduce(hcat, A) and reduce(vcat, A) are optimized to perform better than hcat(A...) and vcat(A...). Is there such thing for cat()?
You’ll probably get the best possible performance by doing this in the simplest way: Construct a new array of the appropriate size and then write a loop that fills in each slice of the stacked array. One-liner solutions are convenient, but a less-clever approach often offers the best performance.
Yeah, the loop is more than 100X faster than most of the options above and allocates vastly less memory:
julia> function assemble(A)
stacked = Array{Int, 3}(undef, size(first(A))..., length(A))
for i in 1:length(A)
stacked[:, :, i] = A[i]
end
stacked
end
assemble (generic function with 1 method)
julia> using BenchmarkTools
julia> A = [rand(Int, (28,28)) for _ ∈ 1:10000];
julia> @btime assemble($A);
24.793 ms (2 allocations: 59.81 MiB)
reduce(vcat, ...) and reduce(hcat, ...) are fast because they have specialized implementations: https://github.com/JuliaLang/julia/blob/e402cf47dd8e3c509969c90c38ad5d57c746eccf/base/abstractarray.jl#L1564-L1568 . In other words, they produce the same result as naively calling vcat on each element, but they do so in a way which is much more efficient than the naive approach (essentially by performing the same allocation and then loop operation that I wrote out by hand).
Unless you want to use a package, in which case you can make a view of them, instead of a new dense array. These should be fast to construct but possibly slower to use, in whatever the next step is.
julia> B = reduce(cat(dims=3), A);
julia> B ≈ JuliennedArrays.Align(A, 1,2)
true
julia> B ≈ LazyStack.stack(A)
true
julia> B ≈ RecursiveArrayTools.VectorOfArray(A)
true