`cat` allocates too much memory

ArthurW · November 19, 2020, 3:03pm

Hi. I’m new to Julia language.

I’m trying to convert a big vector of small arrays into a tensor, but cat() is drawing too much resources.

julia> A = [rand(Int, (28,28)) for _ ∈ 1:10000];

julia> @time cat(A..., dims=3);
  4.226158 seconds (50.11 M allocations: 3.044 GiB, 30.98% gc time)

Oddly, making the operation more explicit dramatically improves this:

julia> stacked = Array{Int, 3}(undef, 28, 28, 10000);

julia> @time map!.(x->x, eachslice(stacked, dims=3), A);
  0.133267 seconds (222.95 k allocations: 11.972 MiB)

This reply to another topic mentions that reduce(hcat, A) and reduce(vcat, A) are optimized to perform better than hcat(A...) and vcat(A...). Is there such thing for cat()?

The following naive attempt made things worse:

julia> @time reduce((x...)->cat(x..., dims=3), A);
  4.431142 seconds (616.15 k allocations: 18.539 GiB, 1.53% gc time)

jling · November 19, 2020, 3:25pm

it’s not cat that is allocating, it’s the splatting

rdeits · November 19, 2020, 3:28pm

You’ll probably get the best possible performance by doing this in the simplest way: Construct a new array of the appropriate size and then write a loop that fills in each slice of the stacked array. One-liner solutions are convenient, but a less-clever approach often offers the best performance.

ArthurW · November 19, 2020, 3:32pm

Is there a way to bypass that?

This solution doesn’t use splatting and perform even worse:

julia> @time reduce((x,y)->cat(x, y, dims=3), A);
  6.262270 seconds (526.79 k allocations: 18.537 GiB, 3.85% gc time)

jling · November 19, 2020, 3:38pm

this allocate once for ever element so even more allocation. Do what @rdeits suggested I think.

rdeits · November 19, 2020, 3:40pm

Yeah, the loop is more than 100X faster than most of the options above and allocates vastly less memory:

julia> function assemble(A)
         stacked = Array{Int, 3}(undef, size(first(A))..., length(A))
         for i in 1:length(A)
           stacked[:, :, i] = A[i]
         end
         stacked
       end
assemble (generic function with 1 method)

julia> using BenchmarkTools

julia> A = [rand(Int, (28,28)) for _ ∈ 1:10000];

julia> @btime assemble($A);
  24.793 ms (2 allocations: 59.81 MiB)

baggepinnen · November 19, 2020, 3:41pm

here’s a version using an optimized reduction and a reshape

julia> reshape(reduce(hcat, A), size(A[1])..., :) == cat(A..., dims=3)
true

julia> @time reshape(reduce(hcat, A), size(A[1])..., :);
  0.028446 seconds (7 allocations: 59.815 MiB)

DNF · November 19, 2020, 3:49pm

So why is cat so much slower than vcat:

julia> @time reduce((x,y)->cat(x, y;dims=3), A);
 12.606505 seconds (526.91 k allocations: 18.537 GiB, 16.69% gc time)

julia> @time reduce(vcat, A);
  0.038854 seconds (2 allocations: 59.815 MiB)

?

rdeits · November 19, 2020, 3:56pm

reduce(vcat, ...) and reduce(hcat, ...) are fast because they have specialized implementations: https://github.com/JuliaLang/julia/blob/e402cf47dd8e3c509969c90c38ad5d57c746eccf/base/abstractarray.jl#L1564-L1568 . In other words, they produce the same result as naively calling vcat on each element, but they do so in a way which is much more efficient than the naive approach (essentially by performing the same allocation and then loop operation that I wrote out by hand).

DNF · November 19, 2020, 4:07pm

It seems like a special implementation for cat would be natural. Is that perhaps more difficult to do, since (x, y)->cat(x, y;dims=n) is anonymous?

jling · November 19, 2020, 4:08pm

yeah I think the dims info is gone.

mcabbott · November 19, 2020, 4:50pm

This PR Allow `reduce(cat(dims=4), A)`, with efficient method for simple cases by mcabbott · Pull Request #37196 · JuliaLang/julia · GitHub wants to make cat(dims=3)::Base.Fix1 to enable precisely this, and then make reduce(cat(dims=3), A) efficient. For now @baggepinnen’s answer is probably the way to go.

Unless you want to use a package, in which case you can make a view of them, instead of a new dense array. These should be fast to construct but possibly slower to use, in whatever the next step is.

julia> B = reduce(cat(dims=3), A);

julia> B ≈ JuliennedArrays.Align(A, 1,2)
true

julia> B ≈ LazyStack.stack(A)
true

julia> B ≈ RecursiveArrayTools.VectorOfArray(A)
true

Topic		Replies	Views
"Cat" very slow for relativelly small tasks Performance	2	444	February 9, 2022
Problem with cat() General Usage	9	2663	October 2, 2018
Slow `reduce(vcat, itr)` Performance cat	26	1396	September 13, 2022
Very best way to concatenate an array of arrays General Usage arrays	47	28331	July 24, 2021
Using hcat with splatting on arrays of different lengths kills Julia General Usage compilation , memory , memory-allocation	2	1398	October 2, 2018

`cat` allocates too much memory

Related topics