Tools for partitioning matrix columns?


I am looking for a convenient package to partition matrices into smaller matrices (ie. cut along the 2nd dim). However, I specifically need to be able to make partitions of unequal sizes.

Let me illustrate with an example. I have a collection of matrices of different 2nd dims sizes:
ìnputs = [rand(4,i) for i in 1:10] # array of size (4,55)
each of these are inputs to pass to a neural network, so in order to avoid 10 separate calls, I first lazily hcat them using LazyArrays:

input = ApplyArray(hcat, inputs...)
output = nn(input) # array of size, say, (2,55)

What I need is a way to partition output along the columns to recover a vector of matrices of sizes [(2,1), (2,2)...(2,10)]. This can be lazily or eagerly, I’m not sure yet which is the most efficient.

Thank you for any recommendation.

Lazy partition is probably the more efficient option here, since it’s columnwise. I’m not aware of any package that provides this, but you can write something like:

julia> function colpartitions(M, nparts)
         @assert nparts*(nparts + 1)/2 == size(M, 2)
         c = 1
         out = Vector{typeof(@view(M[:, 2:3]))}(undef, nparts)
         for i in 1:nparts
           out[i] = @view(M[:, c:c+i-1])
           c += i
colpartitions (generic function with 1 method)

julia> outputs = colpartitions(output, 10);

julia> size.(outputs)
10-element Vector{Tuple{Int64, Int64}}:
 (2, 1)
 (2, 2)
 (2, 3)
 (2, 4)
 (2, 5)
 (2, 6)
 (2, 7)
 (2, 8)
 (2, 9)
 (2, 10)
1 Like

Note that in the above suggestion, Vector{SubArray{eltype(M), 2}} is a container with incompletely typed elements (see the type of my example, below – SubArray has 4 parameters) so will cause dynamic dispatch. This may result in slower performance, depending on where your bottlenecks are. (The above post has since been adjusted to determine the type programmatically, removing the issue I raised.)

For this reason, I go out of my way to use map or generators when the element type might be complicated. That way the compiler does the work for me. Here’s my version:

julia> function colpartitions(M::AbstractMatrix, cols_per_partition)
               partition_stop = cumsum(cols_per_partition)
               all(>=(0), cols_per_partition) || error("partitions must have nonnegative size")
               axes(M,2) == 1:last(partition_stop) || error("columns of M must be 1:sum(cols_per_partition)")
               partitions = map(eachindex(partition_stop)) do i
                       colrange = get(partition_stop,i-1,0)+1:partition_stop[i]
                       return view(M, :, colrange)
               return partitions
colpartitions (generic function with 2 methods)

julia> colpartitions(rand(0:9,2,6), 1:3)
3-element Vector{SubArray{Int64, 2, Matrix{Int64}, Tuple{Base.Slice{Base.OneTo{Int64}}, UnitRange{Int64}}, true}}:
 [9; 8;;]
 [9 4; 4 3]
 [4 0 5; 3 8 4]

With some extra effort, one could make this return a Tuple (rather than Vector) when provided a Tuple of cols_per_partition, but I didn’t go that far here.

1 Like

Not designed for efficiency and working on relative shares rather than absolute values, but BetaMl.partition allows to partition a collection of N-arrays on any of the N dimensions where any of the Nc dimensions can be different size…