Very best way to concatenate an array of arrays

I’ve been wondering about this for a while. Suppose I want to concatenate arrays of arrays. Currently the only way I know of doing that is vcat(A...) (or could be hcat, whatever). My understanding is that this is doubly inefficient, first because it has to allocate a new array (unavoidable), but second because it has to recompile vcat for every number of arguments and every element type of every input array A. Therefore, if you want to do this lots of times on lots of different arrays you are really screwed. I suppose I could just write a function that takes a Vector{<:AbstractVector} and allocates the new array properly, but that seems silly.

What is the very best way of doing this?

using RecursiveArrayTools
VA = VectorOfArray(A)
arr = convert(Array,VA)

VectorOfArray is a lazy representation of the array of arrays as a higher order tensor (/matrix if A is a vector of vectors). It uses the indexing fallback to do the conversion. Honestly, I tried writing my own loop for the conversion but the fallback was faster (Tim Holy Cartesian magic?) so this is what I’ve found to be the fastest, and I don’t know how to recreate it without using the lazy matrix indexing.


Thanks, RecursiveArrayTools looks useful.

I was doing this with a little function in Flux.jl, and a quick test shows it’s sometimes quicker:

A100 = [rand(100) for i=1:100];
using BenchmarkTools

using RecursiveArrayTools
@btime convert(Array,VectorOfArray(A100));  # 25.571 μs

using Flux: batch
@btime batch(A100); # 10.686 μs

maximum( convert(Array,VectorOfArray(A100)) - batch(A100) ) # zero

But for [rand(10) for i=1:1000] this is reversed.


I was stupid and thought vcat with single arg would do the job. It doesn’t.

Whether to be lazy or not kinda depends on how often you plan to access the new array, how much memory you have available and how cache-friendly your planned access is.

vcat(V) just returns V.

Wow, ok I was kind of expecting that by now there’d be a really straightforward answer to this. It’s probably too late now, but shouldn’t there be something in Base that does this? I’m thinking of a function that simply allocates an array with elements type the union of the arguments. It would have to do inefficient allocation, but at least it wouldn’t have the compile penalties of vcat.

See discussion here:

I think the consensus was to implement an efficient reduce(vcat, v) but I think it hasn’t been done yet. It is a performance improvement so should be non-breaking.


There seems to be a fairly common misunderstanding about how compatibility works and what is “too late” after 1.0 is released: you can add new features, including new exported names in 1.x since that does not break older code. What you cannot do is remove or change existing features in a 1.x version.


Note also that a faster reduce(vcat, arrays) method can be added at any time, regardless of feature freezes, since it is just an optimization.


In a slightly more general setting, I have used the following to map and concat an array of array.

mapconcat1(f, v) = [z for x in v for z in f(x)]
mapconcat2(f, v) = vcat(f.(v)...)
mapconcat3(f,v) = mapfoldl(f, append!, [], v)

The first one allocates more but in some cases it was faster than the other two. I guess it depends on the shape of the data.

1 Like

Maybe I’m missing the intent of the question, but I normally write an ugly function similar to this:

function vec2array(v)
    r = length(v)
    c = length(v[1])
    a = Array{Int64}(r,c)
    for i in 1:r, j in 1:c
        a[i,j] = v[i][j]

In Common Lisp, the equivalent of

foo(args...) = args
a = [1,2,3]
foo(a...) === a

is true. It’s a very good optimization. I wonder how breaking it would be to sometimes pass varargs as arrays instead of tuples (and how complicated it would be to optimize in a language with multiple dispatch).

1 Like

I don’t think we’d even need anything like that would we? All I had in mind when posting this thread was something roughly like @sdmcallister suggested being in Base. The only complication I can think of is determining the appropriate Union type for the resulting array. I suppose we’d have to be consistent with the typing behavior of vcat.

By the way, as far as I know even StaticArrays are basically just tuples under the hood, so that would seem to suggest that yes, arguments really do need to be passed as tuples.

The lazy Hcat and Vcat functions in:

do exactly what you want, as far as I understand: they are a lazy / no-copy wrapper around the arrays that behave like the concatenation, including non-scalar indexing and broadcasting.

using LazyArrays
a = Vcat(1:5, 1:6)

# non-scalar

# broadcast
a .* rand(11,2)
a .- Vcat(1:7, 1:4)

RecursiveArrayTools’ ArrayPartitions has similar functionality, but does not currently support non-scalar indexing a[[2,4,5]], and only broadcasts with other ArrayPartitions that have the same underlying sub-vector sizes, whereas LazyArraysHcat and Vcat will broadcast with just about anything afaict.

Haven’t compared performance though.

1 Like

No, it does multiple index indexing like a[3,5], which means the 3rd value in the 5th array (considering it like a matrix), and then Array(a) will “de-lazy” it into a real dense matrix.

Ah right, “multiple indexing” is not the term I should have used, a[3, 5] does indeed have the semantics you mention.

I meant non-scalar indexing as in a[2:5] or a[[2,5,6]], non-scalar indexing currently errors with ArrayPartition. Fixed my post…

ArrayPartition and Hcat/Vcat implementations of getindex look similar under the hood, for both, a[i] loops over the sub-arrays, accumulating their lengths until you reach the sub-array containing index i. ArrayPartition implements multiple indexing with the semantics you mentioned, whereas Vcat only implements a[i,j] for 2 dim arrays, but not for 1d arrays.

Not sure why non-scalar indexing works for Hcat/Vcat and not for ArrayPartition upon cursory glance at the impls, or what it would take to add support for non-scalar indexing of ArrayPartitions.

1 Like

Interesting. I just never thought of doing that one. Could you open an issue?

I would always just write reduce(hcat, v) and expect the code to be optimised in the future unless the performance was critical right now. That’s IMHO the semantically correct way of expressing this operation.


reduce(hcat, v) is optimized AFAIU.


Just saw that
So isn’t that the answer to the question?

I guess it may be possible to optimize Base._typed_vcat (which implements it) further for some special cases (eg sparse arrays), but yes, #27188 takes care of this in a very nice way.

This also suggests the following general strategy: whenever one feels like coding a special-cased implementation for something like this, they might consider making the extra marginal effort of wrapping it up as a PR.

1 Like