Is there a simple/intuitive way to partition a matrix by arbitrary strides? Like i

Is there a simple/intuitive way to partition a matrix by arbitrary strides? Like if I had a 20 x 5 matrix and wanted the output to be a vector/iterable of 10x5, 3x5, 4x5, 3x5 ? Essentially, the “split” part of split-apply-combine, but for a matrix, and based on partition size.
Theoretically:

foo(rand(20,5), [10,3,4,3])

Note that the original poster on Slack cannot see your response here on Discourse. Consider transcribing the appropriate answer back to Slack, or pinging the poster here on Discourse so they can follow this thread.
(Original message :slack:) (More Info)

@views function splititerator(M::AbstractMatrix, rowlens::AbstractVector{<:Integer})
    sum(rowlens) <= size(M,1) || throw(DimensionMismatch())
    let s = cumsum([first(axes(M,1)); rowlens])
        return (M[s[i]:s[i+1]-1,:] for i = 1:length(s)-1)
    end
end
2 Likes

Answer on Slack provided by @mcabbott

julia> function foo(mat, sizes)
       axes(mat,1) == 1:sum(sizes) || error("bad sizes!")
       z = 0
       map(sizes) do s
         r = z+1:z+s
         z = s
         mat[r,:]
       end
       end;
julia> foo(rand(20,5), [10,3,4,3]) .|> size
4-element Vector{Tuple{Int64, Int64}}:
 (10, 5)
 (3, 5)
 (4, 5)
 (3, 5)
1 Like

Second, faster answer provided by @mcabbott on Slack:

foo(mat, sizes::Tuple{}) = ()

@views function foo(mat, sizes::Tuple) 
    (mat[1:first(sizes),:], foo(mat[first(sizes)+1:end,:], Base.tail(sizes))...);
end

julia> foo(rand(20,5), (10,3,4,3)) .|> size
((10, 5), (3, 5), (4, 5), (3, 5))
2 Likes

Small, fixed-size tuples are fast (along with code like that will unroll at compile-time). On the other hand, if you have a list of hundreds of sizes, whose length is determined at runtime, using tuples is likely to be slower.

In this case, because you are talking about an list or generator of submatrices, where the limiting factor in performance is likely to be the subsequent matrix operations rather than generating the list, I wouldn’t tend to recommend tuple tricks — go for code simplicity instead.

4 Likes

As I mentioned in the slack thread, I’d split the problem. First create a function that splits a vector (or likely any iterator) into parts the way you’d like. Then use that to create your more complicated row splitting. For example, we have a builtin iterator that gives you splits of a constant size:

(@views(A[I, :]) for I in Iterators.partition(axes(A, 1), 4))

It’s much more composable this way.

1 Like

@mbauman, trying to follow your illuminating advice. Could you kindly comment on Julia-novice code below. Thanks in advance.

function splititr(b)
    c = cumsum([1;b])
    return [ci:ci+bi-1 for (ci,bi) in zip(c,b)]
end

split_mbauman(M,I0) = (@views(M[I, :]) for I in I0)

M = rand(20,5)
b = [10,3,4,3]
it = splititr(b)
R = collect(split_mbauman(M,it))
# splits in R[1], R[2], ... R[4]
1 Like

Yeah, you could really use any of the above implementations for creating the splits — the idea is just to detangle the splitting of a 1-dimensional iterable/vector from the indexing into the matrix. It makes it easier to test and easier to use in different situations.

1 Like

@mcabbott also suggested this, which is similar to the proposed solution by @rafael.guerra :

function steps(vals, steps)
    i = firstindex(vals)
    (view(vals, i:(i+=s)-1) for s in steps)
end

mat = rand(20,2)
batches = steps(axes(mat,1), [10,3,4,3])

[view(mat,r,:) for r in batches]

Is this functionality (in general) something that can be a PR for Base.Iterators ?

1 Like

The part that could potentially go into Iterators would be the 1-dimensional version. It’d be fairly complicated because it’d have to work on arbitrary iterators, but I’d also want it to have the fancy precomputing stuff for splitting ranges like Iterators.partition does.