Split vector into N (potentially unequal length) subvectors?

Is there a convenience function for splitting a vector into N subvectors? For example, if you had a length-10 vector and called the function to split it into 3 parts, you would get three vectors back, two with length 3 and another with length 4. The subvectors contain all the original items. I see many references to IterTools.partition but it performs a slightly different task.

I could write a little function to do this, but I wanted to see if one is already available.

Iterators.partition is the only builtin Iā€™m aware of that comes close to this purpose. To get the ā€œlong tailā€ partition, rather than the ā€œshort tailā€ given by Iterators.partition, Iā€™d do something like this:

function partitionvec(x,stride,longtail::Bool=true)
    # longtail=true to lengthen the last entry with the leftovers
    # longtail=false to place the leftovers in their own entry
    stride > 0 || error("stride must be positive") # doesn't handle negative strides
    starts = firstindex(x):stride:lastindex(x)-longtail*stride # where to start each subvector
    return [view(x,starts[i]:get(starts,i+1,lastindex(x)+1)-1) for i in eachindex(starts)]
end
1 Like

Yeah I decided just to write a really explicit version

function makechunks(X::AbstractVector{T}, n::Int) where {T}
    L = length(X)
    c = L Ć· n
    Y = Vector{Vector{T}}(undef, n)
    idx = 1
    for i āˆˆ 1:n-1
        Y[i] = X[idx:idx+c-1]
        idx += c 
    end
    Y[end] = X[idx:end]
    return Y
end

You might want to use views instead to avoid copies. Also, the whole thing can be made more compact with a comprehension:

@views function makechunks(X::AbstractVector, n::Integer)
    c = length(X) Ć· n
    return [X[1+c*k:(k == n-1 ? end : c*k+c)] for k = 0:n-1]
end
7 Likes

Yes thatā€™s all fair. Itā€™s not a performance-critical bit of code for me at the moment so I wasnā€™t bothering. Iā€™m just using it to split up some items before calling a function on the groups within a @threads loop.

1 Like

This approach seems to be not the best for divisioning data for parallel execution. Consider

julia> makechunks(collect(1:11), 4)
4-element Vector{SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}}:
 [1, 2]
 [3, 4]
 [5, 6]
 [7, 8, 9, 10, 11]

Splitting data like this is going to bottleneck on the last one that contains the remainder, therefore, at worst the entire processing runs potentlally close to twice as slow as it could be.

I believe this is better:

function equal_partition(n::Int64, parts::Int64)
    if n < parts
        return [ x:x for x in 1:n ]
    end
    starts = push!(Int64.(round.(1:n/parts:n)), n+1)
    return [ starts[i]:starts[i+1]-1 for i in 1:length(starts)-1 ]
end

function equal_partition(V::AbstractVector, parts::Int64)
    ranges = equal_partition(length(V), parts)
    return [ view(V,range) for range in ranges ]
end

julia> equal_partition(collect(1:11), 4)
4-element Vector{SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}}:
 [1, 2, 3]
 [4, 5]
 [6, 7, 8]
 [9, 10, 11]

Edited to work better if n < parts (in which case it returns a smaller array).

2 Likes

certainly not champions of efficiency, but just to play with Juliaā€™s possibilities


parts(n,p,pts=Int[])= cld(n,p)==n/p ? (return push!(pts,fill(cld(n,p),p)...)) : parts(n-cld(n,p),p-1,push!(pts,cld(n,p)))


prng(n,p)=accumulate((s,c)->s.stop+1 : s.stop+c , parts(n,p);init=1:0)


v=rand(1:10,13)
getindex.(Ref(v),prng(length(v),4))
julia> getindex.(Ref(v),prng(length(v),4))
4-element Vector{Vector{Int64}}:
 [10, 5, 3, 8]
 [10, 1, 2]
 [4, 1, 3]
 [2, 6, 9]

julia> v=rand(1:10,10)
julia> getindex.(Ref(v),prng(length(v),4))
4-element Vector{Vector{Int64}}:
 [1, 6, 6]
 [3, 3, 2]
 [6, 6]
 [7, 9]

PS
This should ensure that the maximum difference between the subvector sizes is 1.

1 Like

I was recently introduced to this package:

Just to spread it a bit, seems like it is what people in this thread wants to do :slight_smile:

Kind regards

2 Likes

This simple formula appears to give a scattered partition of the indexes

f(n,k)= [collect(i:k:n) for i in 1:k]
4 Likes

using parts() and the partby iterator as defined here

you could get

parts(n,p)= (np=fill(div(n,p),p); np[1:rem(n,p)].+=1; np)
collect(partby(itr,parts(length(itr),4)))