Concatenating in comprehension

HenriDeh · October 22, 2020, 1:22pm

Hello,

[rand(10,10) rand(10,10)
 rand(10,10) rand(10,10)]

Makes a 20 x 20 Matrix. But

[rand(10,10) for i in 1:2, j in 1:2]

makes a 2×2 Array{Array{Float64,2},2}.

I would like to know if there exists a comprehension syntax that would output the same as the first example. That way I can cat as many matrices as I want. Of course I use random matrices for the MWE but using rand(20,20) is not what I’m looking for.

Many thanks.

mcabbott · October 22, 2020, 1:43pm

You can write hvcat(ntuple(_->2, 2), (rand(10,10) for i in 1:2, j in 1:2)...) but that’s not so pretty. reduce(hcat, [rand(10,10) for i in 1:2, j in 1:2]) makes a 10×40 Matrix. I’d like reduce(cat, [rand(10,10) for i in 1:2, j in 1:2]) to make a 10×10×2×2 Array, but it doesn’t yet.

yha · October 22, 2020, 2:00pm

You can do

reduce(vcat, reduce(hcat, rand(10,10) for i=1:2) for j=1:2)

This allocates intermediate results, which you may be able to avoid with lazy hcat from LazyArrays.jl if it matters.

lmiq · October 22, 2020, 5:51pm

Since this is why I am adopting Julia, and we are never sure about how clear is that for people coming from other backgrounds (particularly I have that experience with students coming from python), I would like to point that you could also write your own function. For example, this would do the trick:

julia> function m(gen,n,m,N)
         M = Matrix{Float64}(undef,N*n,N*m)
         for i in 1:N
           kn = (i-1)*n+1
           for j in 1:N
             km = (j-1)*m+1
             M[kn:kn+n-1,km:km+m-1] .= gen(m,n)
           end
         end
         M
       end
m (generic function with 1 method)

where gen is the function that generates each sub-matrix, (in your mwe rand), n and m are the dimensions of that submatrix, and N is the total number of repetitions. For N relatively larger than 2 (I did not benchmark every case), this function is faster than the compact alternatives:

julia> @btime m(rand,10,10,100);
  4.568 ms (10002 allocations: 16.17 MiB)

julia> @btime reduce(vcat, reduce(hcat, rand(10,10) for i=1:100) for j=1:100);
  108.475 ms (28098 allocations: 779.97 MiB)

and, in addition, it can be easily parallelized:

julia> function m(gen,n,m,N)
         M = Matrix{Float64}(undef,N*n,N*m)
         Threads.@threads for i in 1:N
           kn = (i-1)*n+1
           for j in 1:N
             km = (j-1)*m+1
             M[kn:kn+n-1,km:km+m-1] .= gen(m,n)
           end
         end
         M
       end

julia> @btime m(rand,10,10,100);
  1.500 ms (10023 allocations: 16.18 MiB)

julia> Threads.nthreads()
4

mcabbott · October 22, 2020, 7:22pm

Good point. You can also use reshape to avoid having to do index calculations:

julia> function m3(gen,n,m,N)
         M = Matrix{Float64}(undef,N*n,N*m)
         M4 = reshape(M, (n,N,m,N))
         for j in 1:N
           for i in 1:N
             M4[:,i,:,j] .= gen(n,m)   # n,m reversed from m()
           end
         end
         M
       end
m3 (generic function with 1 method)

julia> m3(ones,2,3,4) == m(ones,2,3,4) # not a strong test!
true

rafael.guerra · October 23, 2020, 7:07am

Why do we need comprehension if the hvcat syntax is already so compact?

M1 = rand(3,5); M2 = rand(3,5); M3 = rand(2,5); M4 = rand(2,5);
MM = hvcat((2),M1,M2,M3,M4)  # concatenate two in each block row

makes a 5×10 Array{Float64,2}:

ctkelley · October 23, 2020, 10:12am

This version is also easier for a novice, like a student coming from Matlab, Pythod, or C to understand. That’s a big deal in the education business.

Skoffer · October 23, 2020, 10:24am

Using @rafael.guerra suggestion an answer to the original question can be

hvcat((2), [rand(5, 5) for _ in 1:4]...)

lmiq · October 23, 2020, 10:51am

And it is as fast and allocates the same thing as the manual version:

julia> @btime hvcat((100), [rand(10, 10) for _ in 1:10000]...)
  4.290 ms (10020 allocations: 16.79 MiB)

I guess this is the best answer (is it possible to parallelize this)?

Juan · October 23, 2020, 11:00am

hvcat((2), [rand(5, 5) for _ in 1:4]…)

Could you parse it and explain each term, please?

HenriDeh · October 23, 2020, 11:17am

Thank you all for your suggestions. I did not know hvcat but it seems to be the cleanest syntax while being efficient. Not that it mattered a lot but it’s good to know how to do things efficiently.

Skoffer · October 23, 2020, 11:54am

@lmiq
I doubt that it can be parallelized, since it’s internal Base function. Yet on the other hand, I also doubt that it would be useful, since this function is mostly doing memory operations and memory operations rarely profit from parallelization (memory is much slower than CPU). As an example compare these two implementations (they are not very efficient, but it is not important, since we use the same operations in both versions)

using BenchmarkTools

x = [rand(10000) for _ in 1:100]
y = Vector{Float64}(undef, 1000000)

function f1!(y, x)
    for i in axes(x, 1)
        for j in axes(x[i], 1)
            y[(i - 1)*length(x[1]) + j] = x[i][j]
        end
    end

    return y
end

function f2!(y, x)
    Threads.@threads for i in axes(x, 1)
        for j in axes(x[i], 1)
            y[(i - 1)*length(x[1]) + j] = x[i][j]
        end
    end

    return y
end

and result is

julia> Threads.nthreads()
8

julia> @btime f1!($y, $x);
@btime f2!($y, $x);
  1.305 ms (0 allocations: 0 bytes)

julia> @btime f2!($y, $x);
  1.201 ms (41 allocations: 5.89 KiB)

The speedup that you observe was mainly related to the parallelization of the generation of the tables, which uses rand and it is a rather costly operation.

@Juan
From hvcat documentation:

Horizontal and vertical concatenation in one call. This function is called for block matrix
syntax. The first argument specifies the number of arguments to concatenate in each block
row.
If the first argument is a single integer n, then all block rows are assumed to have n block
columns.

So, the first argument to hvcat in our example is (2) and it means, that we only want two columns. Consider for example

julia> hvcat((2), 1, 2, 3, 4, 5, 6)
3×2 Array{Int64,2}:
 1  2
 3  4
 5  6

julia> hvcat((3), 1, 2, 3, 4, 5, 6)
2×3 Array{Int64,2}:
 1  2  3
 4  5  6

Now, hvcat do not accept vectors of matrices as an input, so I should have written

hvcat(2, rand(5, 5), rand(5, 5), rand(5, 5), rand(5, 5))

which is of course not scalable and inconvenient if we have a large number of matrices. But hvcat is so called Vararg function and I’ve used splatting syntax, when you add ellipsis to the last argument and julia split it into proper number of positional arguments. For easier understanding it can be written in two lines

rand_matrices = [rand(5, 5) for _ in 1:4]
hvcat(2, rand_matrices...)  # which during parsing turns internally to hvcat(2, rand_matrices[1],  rand_matrices[2], rand_matrices[3], rand_matrices[4])

So as a conclusion, I generated vector of 4 matrices 5x5 and said to hvcat to reorganize them in two block columns.

@HenriDeh
All kudos should go to @rafael.guerra

lmiq · October 23, 2020, 1:12pm

Actually in this case it is generating the sub-matrices of dimensions (m,n) multiple times, and that can be parallelized (this is what happens in the custom function I provided before, which parallelizes very well, because it is embarrassingly parallel).

In your example you moved the generation of the random matrices out of the benchmark, so that does not correspond exactly to the hvcat solution.

rafael.guerra · October 23, 2020, 1:25pm

More variations on the same theme: is the following a proper way of collecting all the matrices Mi before hvcat?

M1 = rand(3,5); M2 = rand(3,5); M3 = rand(2,5); M4 = rand(2,5);
mlist = [M1,M2,M3,M4]
MM = hvcat((2), mlist...) # concatenates two in each block row

lmiq · October 23, 2020, 1:36pm

It is, because mlist is only addressing the positions, i. e.:

julia> x = rand(2) ; y = rand(2);

julia> m = [ x, y ]
2-element Array{Array{Float64,1},1}:
 [0.2383336264485605, 0.8321815241645099]
 [0.5923411990097951, 0.9487571368376246]

julia> m[1]
2-element Array{Float64,1}:
 0.2383336264485605
 0.8321815241645099

julia> m[1][1] = 0.
0.0

julia> x
2-element Array{Float64,1}:
 0.0
 0.8321815241645099

Topic		Replies	Views
Non-catting version of matrix construction General Usage	12	1154	June 13, 2017
Concatening matrix in a loop General Usage	5	1851	July 14, 2019
How not to concatenate in hvcat General Usage question	7	405	September 29, 2023
Get the transpose of Vector{Vector{Float64}} New to Julia arrays , vector , comprehension	9	241	April 17, 2025
Array flattening function General Usage arrays	8	215	November 2, 2024

Concatenating in comprehension

Related topics