Is there a simple/intuitive way to partition a matrix by arbitrary strides? Like if I had a 20 x 5 matrix and wanted the output to be a vector/iterable of 10x5, 3x5, 4x5, 3x5 ? Essentially, the “split” part of split-apply-combine, but for a matrix, and based on partition size.
Theoretically:
foo(rand(20,5), [10,3,4,3])
Note that the original poster on Slack cannot see your response here on Discourse. Consider transcribing the appropriate answer back to Slack, or pinging the poster here on Discourse so they can follow this thread. (Original message )(More Info)
@views function splititerator(M::AbstractMatrix, rowlens::AbstractVector{<:Integer})
sum(rowlens) <= size(M,1) || throw(DimensionMismatch())
let s = cumsum([first(axes(M,1)); rowlens])
return (M[s[i]:s[i+1]-1,:] for i = 1:length(s)-1)
end
end
Small, fixed-size tuples are fast (along with code like that will unroll at compile-time). On the other hand, if you have a list of hundreds of sizes, whose length is determined at runtime, using tuples is likely to be slower.
In this case, because you are talking about an list or generator of submatrices, where the limiting factor in performance is likely to be the subsequent matrix operations rather than generating the list, I wouldn’t tend to recommend tuple tricks — go for code simplicity instead.
As I mentioned in the slack thread, I’d split the problem. First create a function that splits a vector (or likely any iterator) into parts the way you’d like. Then use that to create your more complicated row splitting. For example, we have a builtin iterator that gives you splits of a constant size:
(@views(A[I, :]) for I in Iterators.partition(axes(A, 1), 4))
@mbauman, trying to follow your illuminating advice. Could you kindly comment on Julia-novice code below. Thanks in advance.
function splititr(b)
c = cumsum([1;b])
return [ci:ci+bi-1 for (ci,bi) in zip(c,b)]
end
split_mbauman(M,I0) = (@views(M[I, :]) for I in I0)
M = rand(20,5)
b = [10,3,4,3]
it = splititr(b)
R = collect(split_mbauman(M,it))
# splits in R[1], R[2], ... R[4]
Yeah, you could really use any of the above implementations for creating the splits — the idea is just to detangle the splitting of a 1-dimensional iterable/vector from the indexing into the matrix. It makes it easier to test and easier to use in different situations.
@mcabbott also suggested this, which is similar to the proposed solution by @rafael.guerra :
function steps(vals, steps)
i = firstindex(vals)
(view(vals, i:(i+=s)-1) for s in steps)
end
mat = rand(20,2)
batches = steps(axes(mat,1), [10,3,4,3])
[view(mat,r,:) for r in batches]
Is this functionality (in general) something that can be a PR for Base.Iterators ?
The part that could potentially go into Iterators would be the 1-dimensional version. It’d be fairly complicated because it’d have to work on arbitrary iterators, but I’d also want it to have the fancy precomputing stuff for splitting ranges like Iterators.partition does.