 # `cat` allocates too much memory

Hi. I’m new to Julia language.

I’m trying to convert a big vector of small arrays into a tensor, but `cat()` is drawing too much resources.

``````julia> A = [rand(Int, (28,28)) for _ ∈ 1:10000];

julia> @time cat(A..., dims=3);
4.226158 seconds (50.11 M allocations: 3.044 GiB, 30.98% gc time)
``````

Oddly, making the operation more explicit dramatically improves this:

``````julia> stacked = Array{Int, 3}(undef, 28, 28, 10000);

julia> @time map!.(x->x, eachslice(stacked, dims=3), A);
0.133267 seconds (222.95 k allocations: 11.972 MiB)
``````

This reply to another topic mentions that `reduce(hcat, A)` and `reduce(vcat, A)` are optimized to perform better than `hcat(A...)` and `vcat(A...)`. Is there such thing for `cat()`?

The following naive attempt made things worse:

``````julia> @time reduce((x...)->cat(x..., dims=3), A);
4.431142 seconds (616.15 k allocations: 18.539 GiB, 1.53% gc time)
``````
1 Like

it’s not cat that is allocating, it’s the splatting

4 Likes

You’ll probably get the best possible performance by doing this in the simplest way: Construct a new array of the appropriate size and then write a loop that fills in each slice of the stacked array. One-liner solutions are convenient, but a less-clever approach often offers the best performance.

8 Likes

Is there a way to bypass that?

This solution doesn’t use splatting and perform even worse:

``````julia> @time reduce((x,y)->cat(x, y, dims=3), A);
6.262270 seconds (526.79 k allocations: 18.537 GiB, 3.85% gc time)
``````

this allocate once for ever element so even more allocation. Do what @rdeits suggested I think.

Yeah, the loop is more than 100X faster than most of the options above and allocates vastly less memory:

``````julia> function assemble(A)
stacked = Array{Int, 3}(undef, size(first(A))..., length(A))
for i in 1:length(A)
stacked[:, :, i] = A[i]
end
stacked
end
assemble (generic function with 1 method)

julia> using BenchmarkTools

julia> A = [rand(Int, (28,28)) for _ ∈ 1:10000];

julia> @btime assemble(\$A);
24.793 ms (2 allocations: 59.81 MiB)
``````
1 Like

here’s a version using an optimized reduction and a reshape

``````julia> reshape(reduce(hcat, A), size(A)..., :) == cat(A..., dims=3)
true

julia> @time reshape(reduce(hcat, A), size(A)..., :);
0.028446 seconds (7 allocations: 59.815 MiB)
``````
3 Likes

So why is `cat` so much slower than `vcat`:

``````julia> @time reduce((x,y)->cat(x, y;dims=3), A);
12.606505 seconds (526.91 k allocations: 18.537 GiB, 16.69% gc time)

julia> @time reduce(vcat, A);
0.038854 seconds (2 allocations: 59.815 MiB)
``````

?

1 Like

`reduce(vcat, ...)` and `reduce(hcat, ...)` are fast because they have specialized implementations: https://github.com/JuliaLang/julia/blob/e402cf47dd8e3c509969c90c38ad5d57c746eccf/base/abstractarray.jl#L1564-L1568 . In other words, they produce the same result as naively calling `vcat` on each element, but they do so in a way which is much more efficient than the naive approach (essentially by performing the same allocation and then loop operation that I wrote out by hand).

3 Likes

It seems like a special implementation for `cat` would be natural. Is that perhaps more difficult to do, since `(x, y)->cat(x, y;dims=n)` is anonymous?

yeah I think the `dims` info is gone.

This PR https://github.com/JuliaLang/julia/pull/37196 wants to make `cat(dims=3)::Base.Fix1` to enable precisely this, and then make `reduce(cat(dims=3), A)` efficient. For now @baggepinnen’s answer is probably the way to go.

Unless you want to use a package, in which case you can make a view of them, instead of a new dense array. These should be fast to construct but possibly slower to use, in whatever the next step is.

``````julia> B = reduce(cat(dims=3), A);

julia> B ≈ JuliennedArrays.Align(A, 1,2)
true

julia> B ≈ LazyStack.stack(A)
true

julia> B ≈ RecursiveArrayTools.VectorOfArray(A)
true
``````
3 Likes