Create matrix by combining vectors element-wise

Hello everybody,

The following code does what I want: create a 2 columns matrix containing in rows all the element-wise combination of 2 vectors.

function combination_vectors(v1, v2)
    c1 = reduce(vcat, [fill(v1[i], length(v2)) for i in 1:length(v1)])
    c2 = repeat(v2, length(v1))
    return(hcat(c1, c2))
end
a = [1,2,3]
b = [4,5]

julia> combination_vectors(a, b)
6×2 Matrix{Int64}:
 1  4
 1  5
 2  4
 2  5
 3  4
 3  5

Is there a more efficient/shorter way to do that?

Thank you for any suggestion.

EDIT: I changed the mapreduce(permutdims(.)) to reduce(vcat(.))

You’re generating the cartesian product of the two vectors which we can do with Iterators.product. Depending on what you want to do with this object afterwards, you may not need to call collect afterwards, but if you’re dead set on having it be a two column matrix then we can use reinterpret to minimise allocations and be ultra-performant.

The following code does what you want and is very fast:

reshape(reinterpret(Int, collect(Iterators.product(a, b))), (2, :))'

using BenchmarkTools
a = [1, 2, 3]
b = [4, 5]

@btime reshape(reinterpret(Int, collect(Iterators.product($a, $b))), (2, :))'
> 52.077 ns (1 allocation: 160 bytes)

Note the return type is not a Matrix, but it can be used as one (or you can make it one by calling stuff |> Matrix but you really shouldn’t need to).

Edit: cleaner solution and benchmark

3 Likes

The solution by @jacobusmmsmit is probably the way to go. For completeness, here’s a way to rewrite your code to be shorter and arguably easier to read:

julia> combination_vectors(a, b) = vcat(([f s] for f in a for s in b)...)
combination_vectors (generic function with 1 method)

(@v1.7) julia> combination_vectors(a, b)
6×2 Matrix{Int64}:
 1  4
 1  5
 2  4
 2  5
 3  4
 3  5
3 Likes

If you are willing to consider another data structure, you can just make a matrix of tuples with:

julia> tuple.(a',b)
2×3 Matrix{Tuple{Int64, Int64}}:
 (1, 4)  (2, 4)  (3, 4)
 (1, 5)  (2, 5)  (3, 5)

or, if you want them in a vector:

julia> vec(tuple.(a',b))
6-element Vector{Tuple{Int64, Int64}}:
 (1, 4)
 (1, 5)
 (2, 4)
 (2, 5)
 (3, 4)
 (3, 5)

Tuples (and their relatives StaticArrays) are often much faster more convenient to work with than the matlab-style 2-column matrices. Julia has more data structures than just matrices of numbers, and it often pays to exploit them!

In terms of speed, if I define f(a,b) = vec(tuple.(a',b)), then compared to the combination_vectors code above it is more than 100x faster. It is about the same speed as the code using reinterpret by @jacobusmmsmit above, but is simpler and the results will likely be easier to use too (especially if you use StaticArrays instead of tuples).

julia> using BenchmarkTools

julia> f(a,b) = vec(tuple.(a',b));

julia> combination_vectors(a, b) = vcat(([f s] for f in a for s in b)...);

julia> jacobus(a,b) = reshape(reinterpret(Int, collect(Iterators.product(a, b))), (2, :))';

julia> @btime f($a,$b);
  19.421 μs (4 allocations: 312.62 KiB)

julia> @btime combination_vectors($a,$b);
  3.029 ms (60023 allocations: 4.81 MiB)

julia> @btime jacobus($a,$b);
  18.712 μs (2 allocations: 312.55 KiB)
7 Likes

Yes, the fact that the syntax I proposed is very slow is a longstanding issue, see `stack(vec_of_vecs)` for `vcat(vec_of_vecs...)` · Issue #21672 · JuliaLang/julia · GitHub

In general, it’d be awesome to have generators/comprehensions to build matrices, not just vectors.

That already exists:

julia> x = [(i,j) for i=1:2, j=3:6]
2×4 Matrix{Tuple{Int64, Int64}}:
 (1, 3)  (1, 4)  (1, 5)  (1, 6)
 (2, 3)  (2, 4)  (2, 5)  (2, 6)

julia> y = [a*b for (a,b) in x]
2×4 Matrix{Int64}:
 3  4   5   6
 6  8  10  12

or a contrived solution to the OP problem:

function combination_vectors(a, b)
    na, nb = length(a), length(b)
    [x[x==a ? fld1(i,nb) : mod1(i,nb)] for i in 1:na*nb, x in (a,b)]
end
3 Likes

Yes, of course one can build a matrix with a comprehension; what I wish we had is more flexible, non-contrived and fast syntax that addresses the OP’s case. I’ve often wanted to do define matrices by defining each column, something like [x x.^2 x.^3] with a generator/comprehension.

Less elegant, but 40% faster than @jacobusmmsmit’s solution:

function comb(a, b)
    N = length(a) * length(b)
    out = Matrix{eltype(a)}(undef, N, 2)
    i = 0
    for val1 in a, val2 in b
        i += 1
        out[i, 1] = val1
        out[i, 2] = val2
    end
    return out
end
2 Likes

I’m actually not able to recreate this 40% boost, here are my results:


They seem to be performing about the same, if anything using reinterpret/reshape is slightly faster.

Code (do not run it takes ages)

using BenchmarkTools
using Plots

function DNF(a::T, b::T) where T
    N = length(a) * length(b)
    out = Matrix{eltype(a)}(undef, N, 2)
    i = 0
    for val1 in a, val2 in b
        i += 1
        out[i, 1] = val1
        out[i, 2] = val2
    end
    return out
end

function jacobus(a::T, b::T) where T
    reshape(reinterpret(eltype(a), collect(Iterators.product(a, b))), (2, :))'
end

begin
    Ns = 100:100:1000
    times_DNF = Float64[]
    times_jacobus = Float64[]
    for N in Ns
        a = rand(N)
        b = rand(N)
        push!(times_DNF, @belapsed(DNF($a, $b)))
        push!(times_jacobus, @belapsed(jacobus($a, $b)))
    end
end
begin
    plot(Ns, times_DNF, label = "DNF", legend = :topleft)
    plot!(Ns, times_jacobus, label = "Jacobus")
    plot!(xlabel = "Size of array", ylabel = "Elapsed time")
end

I just tested it with the tiny vectors in the OP.

1 Like

Ah, sorry I went a bit overboard, but I was really curious that something was outspeeding reinterpret/reshape :slight_smile:

On OP’s vectors I do indeed see the speedup.

Yeah, I wouldn’t expect to outrun reinterpret/reshape, but maybe collect(product) :slight_smile:

There’s probably some way to make the loop better, though.