Vcat slow?

Hello,

I’m using vcat() to concatenate many vectors, but I observed that @code_warntype it’s returning a red mark. I marked with *** in the code below:

julia> a = collect(1:10);

julia> b = collect(20:30);

julia> @code_warntype vcat(a, b)
Variables
  #self#::Core.Compiler.Const(vcat, false)
  arrays::Tuple{Array{Int64,1},Array{Int64,1}}
  n::Int64
  @_4::Union{Nothing, Tuple{Array{Int64,1},Int64}}
  arr::Array{Int64,1}
  nd::Int64
  @_7::Union{Nothing, Tuple{Array{Int64,1},Int64}}
  a@_8::Array{Int64,1}
  na::Int64
  msg::Expr
  a@_11::Array{Int64,1}

**********************
  @_12::**Any** HERE

Body::Array{Int64,1}
1 ──       Core.NewvarNode(:(arr))
│          Core.NewvarNode(:(nd))
│          Core.NewvarNode(:(@_7))
│          (n = 0)
│    %5  = arrays::Tuple{Array{Int64,1},Array{Int64,1}}
│          (@_4 = Base.iterate(%5))
│    %7  = (@_4 === nothing)::Bool
│    %8  = Base.not_int(%7)::Bool
└───       goto #4 if not %8
2 ┄─ %10 = @_4::Tuple{Array{Int64,1},Int64}::Tuple{Array{Int64,1},Int64}
│          (a@_8 = Core.getfield(%10, 1))
│    %12 = Core.getfield(%10, 2)::Int64
│    %13 = n::Int64
│    %14 = Base.length(a@_8)::Int64
│          (n = %13 + %14)
│          (@_4 = Base.iterate(%5, %12))
│    %17 = (@_4 === nothing)::Bool
│    %18 = Base.not_int(%17)::Bool
└───       goto #4 if not %18
3 ──       goto #2
4 ┄─ %21 = Core.apply_type(Base.Vector, $(Expr(:static_parameter, 1)))::Core.Compiler.Const(Array{Int64,1}, false)
│          (arr = (%21)(Base.undef, n))
│          (nd = 1)
│    %24 = arrays::Tuple{Array{Int64,1},Array{Int64,1}}
│          (@_7 = Base.iterate(%24))
│    %26 = (@_7 === nothing)::Bool
│    %27 = Base.not_int(%26)::Bool
└───       goto #13 if not %27
5 ┄─       Core.NewvarNode(:(msg))
│    %30 = @_7::Tuple{Array{Int64,1},Int64}::Tuple{Array{Int64,1},Int64}
│          (a@_11 = Core.getfield(%30, 1))
│    %32 = Core.getfield(%30, 2)::Int64
│          (na = Base.length(a@_11))
│    %34 = (nd + na)::Int64
│    %35 = Base.length(arr)::Int64
│    %36 = (1 + %35)::Int64
│    %37 = (%34 <= %36)::Bool
└───       goto #7 if not %37
6 ──       goto #11
7 ──       (msg = $(Expr(:copyast, :($(QuoteNode(:(nd + na <= 1 + length(arr))))))))
│    %41 = Base.isdefined(Base.Main, :Base)::Bool
└───       goto #9 if not %41
*********************** HERE *********
8 ── %43 = Base.getproperty(Base.Main, :Base)::**Any**
│    %44 = Base.getproperty(%43, :string)::**Any**
│          (@_12 = (%44)(msg))
└───       goto #10
*********************** HERE *********
9 ── %47 = Core.println::Core.Compiler.Const(Core.println, false)
│          (%47)(msg)
└───       (@_12 = "Error during bootstrap. See stdout.")
10 ┄ %50 = @_12::Any
│    %51 = Base.AssertionError(%50)::AssertionError
└───       Base.throw(%51)
11 ┄       Base.unsafe_copyto!(arr, nd, a@_11, 1, na)
│          (nd = nd + na)
│          (@_7 = Base.iterate(%24, %32))
│    %56 = (@_7 === nothing)::Bool
│    %57 = Base.not_int(%56)::Bool
└───       goto #13 if not %57
12 ─       goto #5
13 ┄       return arr

You’re looking pretty deep into the guts of that function, at what appears to be some code devoted to handling error cases. The actual return type is inferred correctly (see the Body::Array{Int64,1} before the code), so I wouldn’t worry about this unless you have a particular reason to.

1 Like

Yes, this only happens in an error branch and is highly unlikely to affect the runtime — especially as it only runs once per argument (and not once per element). Just for fun, I completely removed the @assert macro call that is generating it (thus removing both the instability and the error-checking itself!) and compared the timings:

julia> @benchmark vcat($a, $b)
BenchmarkTools.Trial:
  memory estimate:  256 bytes
  allocs estimate:  1
  --------------
  minimum time:     44.549 ns (0.00% GC)
  median time:      46.781 ns (0.00% GC)
  mean time:        51.098 ns (2.51% GC)
  maximum time:     418.716 ns (84.10% GC)
  --------------
  samples:          10000
  evals/sample:     981

julia> @benchmark vcat_noassert($a, $b)
BenchmarkTools.Trial:
  memory estimate:  256 bytes
  allocs estimate:  1
  --------------
  minimum time:     45.264 ns (0.00% GC)
  median time:      47.103 ns (0.00% GC)
  mean time:        51.896 ns (2.47% GC)
  maximum time:     409.609 ns (83.03% GC)
  --------------
  samples:          10000
  evals/sample:     983

@code_warntype vcat_noassert(a, b) then shows no instabilities, but it doesn’t really change anything.

6 Likes

It’s perhaps worth noting that while vcat itself shouldn’t be slow, splatting lots of arrays into it can be! Instead of splatting them, use reduce(vcat, array_of_arrays).

7 Likes

I am finding that vcat is the slowest part when I try to go from a vector of length l containing n,m array into a an array of l, (n*m). Right now I am looping and dumping it into my new array using

M[:, i] .= reduce(vcat,v[i])

Is there a faster way? My array is not that big and it takes much longer than I anticipated for my 16k vector of 65x64 arrays.

You could just write the loop and see if that is significantly faster. If it is, then there is probably some improvements to be made. If it isn’t, maybe you miss-estimated the cost of the operation.

1 Like

If I read this correctly, you are calling reduce(vcat, matrix). Instead, can you try

M[:, i] .= vec(v[i]) 

?
If you’re creating a vector from a matrix, then calling reduce(vcat,) on it would be super inefficient.

I am not sure if you are right @kristoffer.carlsson, the fastest is probably not using vcat or a loop but instead allocating a matrix of the final size and copying each cell of the small matrices to the right position in the final matrix. This would avoid multiple copies and intermediary allocations, no?

This is my function

function to_2d_array(v::Vector{Matrix{T}}) where {T}
   n,m = size(v[1])
   M = Array{T}(undef, length(v),n*m) # preallocate a 3D array of the right size
   @show size(M)
   for i in 1:size(v,1)
      M[i,:] = reduce(hcat,v[i])
   end
   M
end

Can you try vec(v[i]) instead? I think it should work, but I’m not at my computer. If it doesn’t work, then use transpose before assignment.

Also, you probably shouldn’t have @show in there, it’s slow.

1 Like

If you have a matrix, than repeatedly calling vcat or hcat will be incredibly slow. In contrast, vec is a nearly cost-free operation that just reshapes the container without changing the underlying data.

Even for small matrices the contrast is stark:

julia> M = rand(10, 10);

julia> @btime reduce(hcat, $M);
  507.861 μs (3823 allocations: 186.78 KiB)

julia> @btime vec($M);
  38.858 ns (2 allocations: 80 bytes)

julia> @btime copy(vec($M));
  97.025 ns (3 allocations: 976 bytes)

The copy version is there in case you don’t want to alias the underlying data.

It gets much worse for bigger data:

julia> M = rand(100, 100);

julia> @btime reduce(hcat, $M);
  62.174 ms (399669 allocations: 39.42 MiB)

julia> @btime vec($M);
  38.854 ns (2 allocations: 80 bytes)

julia> @btime copy(vec($M));
  4.305 μs (4 allocations: 78.28 KiB)

Note ms/ns/μs.

2 Likes

OK, I just did a test:

function tomatrix(V)
    L = length(first(V))
    M = Matrix{eltype(first(V))}(undef, length(V), L)
    for (i, v) in enumerate(V)
        M[i, :] .= vec(v)
    end
    return M
end

Benchmark:

julia> using BenchmarkTools

julia> v = [rand(5, 6) for _ in 1:50];

julia> tomatrix(v) == to_2d_array(v)
true

julia> @btime to_2d_array($v);
  7.662 ms (54651 allocations: 2.23 MiB)

julia> @btime tomatrix($v);
  4.293 μs (151 allocations: 18.13 KiB)

Almost 2000x times speedup.

It would be even faster if you assigned columnwise:

function tomatrix(V)
    L = length(first(V))
    M = Matrix{eltype(first(V))}(undef, L, length(V))
    for (i, v) in enumerate(V)
        M[:, i] .= vec(v)
    end
    return M
end

Edit:

I tried with your size of data. tomatrix ran in 0.25 seconds, to_2d_array is still chugging away, don’t know when it will finish. (Edit2: It took 9 minutes.)

6 Likes

Thanks so much @DNF, this does indeed work much faster! Now I know I should use vec :slight_smile: