Vcat slow?

Ronneesley · June 1, 2020, 8:19pm

Hello,

I’m using vcat() to concatenate many vectors, but I observed that @code_warntype it’s returning a red mark. I marked with *** in the code below:

julia> a = collect(1:10);

julia> b = collect(20:30);

julia> @code_warntype vcat(a, b)
Variables
  #self#::Core.Compiler.Const(vcat, false)
  arrays::Tuple{Array{Int64,1},Array{Int64,1}}
  n::Int64
  @_4::Union{Nothing, Tuple{Array{Int64,1},Int64}}
  arr::Array{Int64,1}
  nd::Int64
  @_7::Union{Nothing, Tuple{Array{Int64,1},Int64}}
  a@_8::Array{Int64,1}
  na::Int64
  msg::Expr
  a@_11::Array{Int64,1}

**********************
  @_12::**Any** HERE

Body::Array{Int64,1}
1 ──       Core.NewvarNode(:(arr))
│          Core.NewvarNode(:(nd))
│          Core.NewvarNode(:(@_7))
│          (n = 0)
│    %5  = arrays::Tuple{Array{Int64,1},Array{Int64,1}}
│          (@_4 = Base.iterate(%5))
│    %7  = (@_4 === nothing)::Bool
│    %8  = Base.not_int(%7)::Bool
└───       goto #4 if not %8
2 ┄─ %10 = @_4::Tuple{Array{Int64,1},Int64}::Tuple{Array{Int64,1},Int64}
│          (a@_8 = Core.getfield(%10, 1))
│    %12 = Core.getfield(%10, 2)::Int64
│    %13 = n::Int64
│    %14 = Base.length(a@_8)::Int64
│          (n = %13 + %14)
│          (@_4 = Base.iterate(%5, %12))
│    %17 = (@_4 === nothing)::Bool
│    %18 = Base.not_int(%17)::Bool
└───       goto #4 if not %18
3 ──       goto #2
4 ┄─ %21 = Core.apply_type(Base.Vector, $(Expr(:static_parameter, 1)))::Core.Compiler.Const(Array{Int64,1}, false)
│          (arr = (%21)(Base.undef, n))
│          (nd = 1)
│    %24 = arrays::Tuple{Array{Int64,1},Array{Int64,1}}
│          (@_7 = Base.iterate(%24))
│    %26 = (@_7 === nothing)::Bool
│    %27 = Base.not_int(%26)::Bool
└───       goto #13 if not %27
5 ┄─       Core.NewvarNode(:(msg))
│    %30 = @_7::Tuple{Array{Int64,1},Int64}::Tuple{Array{Int64,1},Int64}
│          (a@_11 = Core.getfield(%30, 1))
│    %32 = Core.getfield(%30, 2)::Int64
│          (na = Base.length(a@_11))
│    %34 = (nd + na)::Int64
│    %35 = Base.length(arr)::Int64
│    %36 = (1 + %35)::Int64
│    %37 = (%34 <= %36)::Bool
└───       goto #7 if not %37
6 ──       goto #11
7 ──       (msg = $(Expr(:copyast, :($(QuoteNode(:(nd + na <= 1 + length(arr))))))))
│    %41 = Base.isdefined(Base.Main, :Base)::Bool
└───       goto #9 if not %41
*********************** HERE *********
8 ── %43 = Base.getproperty(Base.Main, :Base)::**Any**
│    %44 = Base.getproperty(%43, :string)::**Any**
│          (@_12 = (%44)(msg))
└───       goto #10
*********************** HERE *********
9 ── %47 = Core.println::Core.Compiler.Const(Core.println, false)
│          (%47)(msg)
└───       (@_12 = "Error during bootstrap. See stdout.")
10 ┄ %50 = @_12::Any
│    %51 = Base.AssertionError(%50)::AssertionError
└───       Base.throw(%51)
11 ┄       Base.unsafe_copyto!(arr, nd, a@_11, 1, na)
│          (nd = nd + na)
│          (@_7 = Base.iterate(%24, %32))
│    %56 = (@_7 === nothing)::Bool
│    %57 = Base.not_int(%56)::Bool
└───       goto #13 if not %57
12 ─       goto #5
13 ┄       return arr

rdeits · June 1, 2020, 8:26pm

You’re looking pretty deep into the guts of that function, at what appears to be some code devoted to handling error cases. The actual return type is inferred correctly (see the Body::Array{Int64,1} before the code), so I wouldn’t worry about this unless you have a particular reason to.

mbauman · June 1, 2020, 8:35pm

Yes, this only happens in an error branch and is highly unlikely to affect the runtime — especially as it only runs once per argument (and not once per element). Just for fun, I completely removed the @assert macro call that is generating it (thus removing both the instability and the error-checking itself!) and compared the timings:

julia> @benchmark vcat($a, $b)
BenchmarkTools.Trial:
  memory estimate:  256 bytes
  allocs estimate:  1
  --------------
  minimum time:     44.549 ns (0.00% GC)
  median time:      46.781 ns (0.00% GC)
  mean time:        51.098 ns (2.51% GC)
  maximum time:     418.716 ns (84.10% GC)
  --------------
  samples:          10000
  evals/sample:     981

julia> @benchmark vcat_noassert($a, $b)
BenchmarkTools.Trial:
  memory estimate:  256 bytes
  allocs estimate:  1
  --------------
  minimum time:     45.264 ns (0.00% GC)
  median time:      47.103 ns (0.00% GC)
  mean time:        51.896 ns (2.47% GC)
  maximum time:     409.609 ns (83.03% GC)
  --------------
  samples:          10000
  evals/sample:     983

@code_warntype vcat_noassert(a, b) then shows no instabilities, but it doesn’t really change anything.

mbauman · June 1, 2020, 8:52pm

It’s perhaps worth noting that while vcat itself shouldn’t be slow, splatting lots of arrays into it can be! Instead of splatting them, use reduce(vcat, array_of_arrays).

bafonso · June 14, 2020, 6:30pm

I am finding that vcat is the slowest part when I try to go from a vector of length l containing n,m array into a an array of l, (n*m). Right now I am looping and dumping it into my new array using

M[:, i] .= reduce(vcat,v[i])

Is there a faster way? My array is not that big and it takes much longer than I anticipated for my 16k vector of 65x64 arrays.

kristoffer.carlsson · June 14, 2020, 6:39pm

You could just write the loop and see if that is significantly faster. If it is, then there is probably some improvements to be made. If it isn’t, maybe you miss-estimated the cost of the operation.

DNF · June 14, 2020, 6:49pm

If I read this correctly, you are calling reduce(vcat, matrix). Instead, can you try

M[:, i] .= vec(v[i])

?
If you’re creating a vector from a matrix, then calling reduce(vcat,) on it would be super inefficient.

Henrique_Becker · June 14, 2020, 6:50pm

I am not sure if you are right @kristoffer.carlsson, the fastest is probably not using vcat or a loop but instead allocating a matrix of the final size and copying each cell of the small matrices to the right position in the final matrix. This would avoid multiple copies and intermediary allocations, no?

bafonso · June 14, 2020, 6:56pm

This is my function

function to_2d_array(v::Vector{Matrix{T}}) where {T}
   n,m = size(v[1])
   M = Array{T}(undef, length(v),n*m) # preallocate a 3D array of the right size
   @show size(M)
   for i in 1:size(v,1)
      M[i,:] = reduce(hcat,v[i])
   end
   M
end

DNF · June 14, 2020, 7:03pm

Can you try vec(v[i]) instead? I think it should work, but I’m not at my computer. If it doesn’t work, then use transpose before assignment.

Also, you probably shouldn’t have @show in there, it’s slow.

DNF · June 14, 2020, 7:18pm

If you have a matrix, than repeatedly calling vcat or hcat will be incredibly slow. In contrast, vec is a nearly cost-free operation that just reshapes the container without changing the underlying data.

Even for small matrices the contrast is stark:

julia> M = rand(10, 10);

julia> @btime reduce(hcat, $M);
  507.861 μs (3823 allocations: 186.78 KiB)

julia> @btime vec($M);
  38.858 ns (2 allocations: 80 bytes)

julia> @btime copy(vec($M));
  97.025 ns (3 allocations: 976 bytes)

The copy version is there in case you don’t want to alias the underlying data.

It gets much worse for bigger data:

julia> M = rand(100, 100);

julia> @btime reduce(hcat, $M);
  62.174 ms (399669 allocations: 39.42 MiB)

julia> @btime vec($M);
  38.854 ns (2 allocations: 80 bytes)

julia> @btime copy(vec($M));
  4.305 μs (4 allocations: 78.28 KiB)

Note ms/ns/μs.

DNF · June 14, 2020, 8:00pm

OK, I just did a test:

function tomatrix(V)
    L = length(first(V))
    M = Matrix{eltype(first(V))}(undef, length(V), L)
    for (i, v) in enumerate(V)
        M[i, :] .= vec(v)
    end
    return M
end

Benchmark:

julia> using BenchmarkTools

julia> v = [rand(5, 6) for _ in 1:50];

julia> tomatrix(v) == to_2d_array(v)
true

julia> @btime to_2d_array($v);
  7.662 ms (54651 allocations: 2.23 MiB)

julia> @btime tomatrix($v);
  4.293 μs (151 allocations: 18.13 KiB)

Almost 2000x times speedup.

It would be even faster if you assigned columnwise:

function tomatrix(V)
    L = length(first(V))
    M = Matrix{eltype(first(V))}(undef, L, length(V))
    for (i, v) in enumerate(V)
        M[:, i] .= vec(v)
    end
    return M
end

Edit:

I tried with your size of data. tomatrix ran in 0.25 seconds, to_2d_array is still chugging away, don’t know when it will finish. (Edit2: It took 9 minutes.)

bafonso · June 15, 2020, 12:47am

Thanks so much @DNF, this does indeed work much faster! Now I know I should use vec

Topic		Replies	Views
Vcat gives Any General Usage	8	182	July 24, 2023
Slow `reduce(vcat, itr)` Performance cat	26	1391	September 13, 2022
Very best way to concatenate an array of arrays General Usage arrays	47	28300	July 24, 2021
`cat` allocates too much memory Performance question	11	1335	November 19, 2020
Type inference for vcat General Usage	7	1363	April 6, 2018

Vcat slow?

Related topics