How to "collapse" a Vector of Vector?

julia> vv = [[1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0] ]
3-element Array{Array{Float64,1},1}:
 [1.0, 2.0]     
 [3.0, 4.0, 5.0]
 [6.0, 7.0]     

is there an efficient way to transform it into:

[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]

?

thanks.

reduce(vcat, vv) is a simple way.

A lazy (and hence efficient) option would be Iterators.flatten(vv).

julia> @btime reduce($vcat, $vv);
  53.095 ns (1 allocation: 144 bytes)

julia> @btime Iterators.flatten($vv);
  5.800 ns (1 allocation: 16 bytes)

julia> @btime collect(Iterators.flatten($vv));
  102.331 ns (4 allocations: 224 bytes)

There also is RecursiveArrayTools.jl which might be useful.

9 Likes

Thanks. reduce() seems to be quite efficient (and general).

Just curious if it is possible to be even more efficient than the vvcat() function defined below?

julia> function vvcat(vv)
           n = sum(length, vv)
           y = Vector{Float64}(undef, n)
           position = 1
           for i in Base.OneTo(length(vv) )
               curr = vv[i]
               l = length(curr)
               y[position:(position + l - 1) ] = curr
               position += l
           end
           return y
       end
vvcat (generic function with 1 method)

julia> @btime vvcat($vv)
  49.657 ns (1 allocation: 144 bytes)
7-element Array{Float64,1}:
 1.0
 2.0
 3.0
 4.0
 5.0
 6.0
 7.0

julia> @btime reduce(vcat, $vv)
  52.273 ns (1 allocation: 144 bytes)
7-element Array{Float64,1}:
 1.0
 2.0
 3.0
 4.0
 5.0
 6.0
 7.0

You can stick an @inbounds on the for loop to save some time for bounds checking.

Just for fun, I took a crack at further optimization and expanded generality (as I happen to need a function like this at the moment).

julia> function vvcat(vv::Vector{Vector{T}})::Vector{T} where T
           isempty(vv) && return []
           y = Vector{T}(undef, sum(length, vv))
           position = 1
           for curr in vv
               l = length(curr)
               @inbounds y[position:(position + l - 1)] = curr
               position += l
           end
           return y
       end
vvcat (generic function with 1 method)

julia> vv = [[1.0, 2.0], [3.0, 4.0, 5.0], [6.0, 7.0] ]
3-element Array{Array{Float64,1},1}:
 [1.0, 2.0]
 [3.0, 4.0, 5.0]
 [6.0, 7.0]

julia> @btime vvcat($vv);  @btime reduce(vcat, $vv);
  39.858 ns (1 allocation: 144 bytes)
  47.065 ns (1 allocation: 144 bytes)

You probably want isempty(vv) && return T[] or you might see strange bugs from mixing Any arrays with T arrays in your program.

Thanks for the suggestion. I was assuming the final convert due to the output typing would take care of that, no? But still, I agree - better to remove the ambiguity.

1 Like

You’re right of course. My bad. :slight_smile: I don’t use function result type-annotations very often and glossed right over that.

1 Like

This one is 30% faster on my computer:

function vvcat(vv::Vector{Vector{T}}) where {T}
    out = Vector{T}(undef, sum(length, vv))
    i = 0
    for v in vv, x in v
       @inbounds out[i+=1] = x
    end
    return out
end
4 Likes

Oh, very nice! I see the same. I am a Julia newbie, and that double-loop syntax does not yet spring to mind. Now I have some back-porting to do. :slight_smile: Thanks!

@DNF, do you agree that the isempty(vv) && return T[] is still required? Without it, I get ERROR: ArgumentError: reducing over an empty collection is not allowed from the sum reduction.

Yeah, I didn’t bother with the empty case, I wanted to focus on the point of handling one sample at the time.

You can either do the test, like you suggest, or you can replace

sum(length, vv)

with

mapreduce(length, +, vv; init=0)

which initializes the empty case to zero length.

2 Likes

Note that reduce(vcat ,vv::AbstractVector{<:AbstractVector}) uses an optimized implementation essentially equivalent to vcat(vv...) (but which avoids the overhead of recompiling for each new number of entries in vv).

3 Likes