Concatenating in comprehension

Hello,

[rand(10,10) rand(10,10)
 rand(10,10) rand(10,10)]

Makes a 20 x 20 Matrix. But

[rand(10,10) for i in 1:2, j in 1:2]

makes a 2×2 Array{Array{Float64,2},2}.

I would like to know if there exists a comprehension syntax that would output the same as the first example. That way I can cat as many matrices as I want. Of course I use random matrices for the MWE but using rand(20,20) is not what I’m looking for.

Many thanks.

1 Like

You can write hvcat(ntuple(_->2, 2), (rand(10,10) for i in 1:2, j in 1:2)...) but that’s not so pretty. reduce(hcat, [rand(10,10) for i in 1:2, j in 1:2]) makes a 10×40 Matrix. I’d like reduce(cat, [rand(10,10) for i in 1:2, j in 1:2]) to make a 10×10×2×2 Array, but it doesn’t yet.

1 Like

You can do

reduce(vcat, reduce(hcat, rand(10,10) for i=1:2) for j=1:2)

This allocates intermediate results, which you may be able to avoid with lazy hcat from LazyArrays.jl if it matters.

3 Likes

Since this is why I am adopting Julia, and we are never sure about how clear is that for people coming from other backgrounds (particularly I have that experience with students coming from python), I would like to point that you could also write your own function. For example, this would do the trick:

julia> function m(gen,n,m,N)
         M = Matrix{Float64}(undef,N*n,N*m)
         for i in 1:N
           kn = (i-1)*n+1
           for j in 1:N
             km = (j-1)*m+1
             M[kn:kn+n-1,km:km+m-1] .= gen(m,n)
           end
         end
         M
       end
m (generic function with 1 method)

where gen is the function that generates each sub-matrix, (in your mwe rand), n and m are the dimensions of that submatrix, and N is the total number of repetitions. For N relatively larger than 2 (I did not benchmark every case), this function is faster than the compact alternatives:

julia> @btime m(rand,10,10,100);
  4.568 ms (10002 allocations: 16.17 MiB)

julia> @btime reduce(vcat, reduce(hcat, rand(10,10) for i=1:100) for j=1:100);
  108.475 ms (28098 allocations: 779.97 MiB)

and, in addition, it can be easily parallelized:

julia> function m(gen,n,m,N)
         M = Matrix{Float64}(undef,N*n,N*m)
         Threads.@threads for i in 1:N
           kn = (i-1)*n+1
           for j in 1:N
             km = (j-1)*m+1
             M[kn:kn+n-1,km:km+m-1] .= gen(m,n)
           end
         end
         M
       end

julia> @btime m(rand,10,10,100);
  1.500 ms (10023 allocations: 16.18 MiB)

julia> Threads.nthreads()
4


9 Likes

Good point. You can also use reshape to avoid having to do index calculations:

julia> function m3(gen,n,m,N)
         M = Matrix{Float64}(undef,N*n,N*m)
         M4 = reshape(M, (n,N,m,N))
         for j in 1:N
           for i in 1:N
             M4[:,i,:,j] .= gen(n,m)   # n,m reversed from m()
           end
         end
         M
       end
m3 (generic function with 1 method)

julia> m3(ones,2,3,4) == m(ones,2,3,4) # not a strong test!
true
2 Likes

Why do we need comprehension if the hvcat syntax is already so compact?

M1 = rand(3,5); M2 = rand(3,5); M3 = rand(2,5); M4 = rand(2,5);
MM = hvcat((2),M1,M2,M3,M4)  # concatenate two in each block row

makes a 5×10 Array{Float64,2}:

2 Likes

This version is also easier for a novice, like a student coming from Matlab, Pythod, or C to understand. That’s a big deal in the education business.

Using @rafael.guerra suggestion an answer to the original question can be

hvcat((2), [rand(5, 5) for _ in 1:4]...)
3 Likes

And it is as fast and allocates the same thing as the manual version:

julia> @btime hvcat((100), [rand(10, 10) for _ in 1:10000]...)
  4.290 ms (10020 allocations: 16.79 MiB)

I guess this is the best answer (is it possible to parallelize this)?

hvcat((2), [rand(5, 5) for _ in 1:4]…)

Could you parse it and explain each term, please?

Thank you all for your suggestions. I did not know hvcat but it seems to be the cleanest syntax while being efficient. Not that it mattered a lot but it’s good to know how to do things efficiently.

@lmiq
I doubt that it can be parallelized, since it’s internal Base function. Yet on the other hand, I also doubt that it would be useful, since this function is mostly doing memory operations and memory operations rarely profit from parallelization (memory is much slower than CPU). As an example compare these two implementations (they are not very efficient, but it is not important, since we use the same operations in both versions)

using BenchmarkTools

x = [rand(10000) for _ in 1:100]
y = Vector{Float64}(undef, 1000000)

function f1!(y, x)
    for i in axes(x, 1)
        for j in axes(x[i], 1)
            y[(i - 1)*length(x[1]) + j] = x[i][j]
        end
    end

    return y
end

function f2!(y, x)
    Threads.@threads for i in axes(x, 1)
        for j in axes(x[i], 1)
            y[(i - 1)*length(x[1]) + j] = x[i][j]
        end
    end

    return y
end

and result is

julia> Threads.nthreads()
8

julia> @btime f1!($y, $x);
@btime f2!($y, $x);
  1.305 ms (0 allocations: 0 bytes)

julia> @btime f2!($y, $x);
  1.201 ms (41 allocations: 5.89 KiB)

The speedup that you observe was mainly related to the parallelization of the generation of the tables, which uses rand and it is a rather costly operation.

@Juan
From hvcat documentation:

Horizontal and vertical concatenation in one call. This function is called for block matrix
syntax. The first argument specifies the number of arguments to concatenate in each block
row.
If the first argument is a single integer n, then all block rows are assumed to have n block
columns.

So, the first argument to hvcat in our example is (2) and it means, that we only want two columns. Consider for example

julia> hvcat((2), 1, 2, 3, 4, 5, 6)
3×2 Array{Int64,2}:
 1  2
 3  4
 5  6

julia> hvcat((3), 1, 2, 3, 4, 5, 6)
2×3 Array{Int64,2}:
 1  2  3
 4  5  6

Now, hvcat do not accept vectors of matrices as an input, so I should have written

hvcat(2, rand(5, 5), rand(5, 5), rand(5, 5), rand(5, 5))

which is of course not scalable and inconvenient if we have a large number of matrices. But hvcat is so called Vararg function and I’ve used splatting syntax, when you add ellipsis to the last argument and julia split it into proper number of positional arguments. For easier understanding it can be written in two lines

rand_matrices = [rand(5, 5) for _ in 1:4]
hvcat(2, rand_matrices...)  # which during parsing turns internally to hvcat(2, rand_matrices[1],  rand_matrices[2], rand_matrices[3], rand_matrices[4])

So as a conclusion, I generated vector of 4 matrices 5x5 and said to hvcat to reorganize them in two block columns.

@HenriDeh
All kudos should go to @rafael.guerra :slight_smile:

4 Likes

Actually in this case it is generating the sub-matrices of dimensions (m,n) multiple times, and that can be parallelized (this is what happens in the custom function I provided before, which parallelizes very well, because it is embarrassingly parallel).

In your example you moved the generation of the random matrices out of the benchmark, so that does not correspond exactly to the hvcat solution.

More variations on the same theme: is the following a proper way of collecting all the matrices Mi before hvcat?

M1 = rand(3,5); M2 = rand(3,5); M3 = rand(2,5); M4 = rand(2,5);
mlist = [M1,M2,M3,M4]
MM = hvcat((2), mlist...) # concatenates two in each block row

It is, because mlist is only addressing the positions, i. e.:

julia> x = rand(2) ; y = rand(2);

julia> m = [ x, y ]
2-element Array{Array{Float64,1},1}:
 [0.2383336264485605, 0.8321815241645099]
 [0.5923411990097951, 0.9487571368376246]

julia> m[1]
2-element Array{Float64,1}:
 0.2383336264485605
 0.8321815241645099

julia> m[1][1] = 0.
0.0

julia> x
2-element Array{Float64,1}:
 0.0
 0.8321815241645099


1 Like