Split vector into N (potentially unequal length) subvectors?

markmbaum · December 23, 2021, 8:16pm

Is there a convenience function for splitting a vector into N subvectors? For example, if you had a length-10 vector and called the function to split it into 3 parts, you would get three vectors back, two with length 3 and another with length 4. The subvectors contain all the original items. I see many references to IterTools.partition but it performs a slightly different task.

I could write a little function to do this, but I wanted to see if one is already available.

mikmoore · December 23, 2021, 9:22pm

Iterators.partition is the only builtin I’m aware of that comes close to this purpose. To get the “long tail” partition, rather than the “short tail” given by Iterators.partition, I’d do something like this:

function partitionvec(x,stride,longtail::Bool=true)
    # longtail=true to lengthen the last entry with the leftovers
    # longtail=false to place the leftovers in their own entry
    stride > 0 || error("stride must be positive") # doesn't handle negative strides
    starts = firstindex(x):stride:lastindex(x)-longtail*stride # where to start each subvector
    return [view(x,starts[i]:get(starts,i+1,lastindex(x)+1)-1) for i in eachindex(starts)]
end

markmbaum · December 23, 2021, 9:37pm

Yeah I decided just to write a really explicit version

function makechunks(X::AbstractVector{T}, n::Int) where {T}
    L = length(X)
    c = L ÷ n
    Y = Vector{Vector{T}}(undef, n)
    idx = 1
    for i ∈ 1:n-1
        Y[i] = X[idx:idx+c-1]
        idx += c 
    end
    Y[end] = X[idx:end]
    return Y
end

stevengj · December 23, 2021, 9:55pm

You might want to use views instead to avoid copies. Also, the whole thing can be made more compact with a comprehension:

@views function makechunks(X::AbstractVector, n::Integer)
    c = length(X) ÷ n
    return [X[1+c*k:(k == n-1 ? end : c*k+c)] for k = 0:n-1]
end

markmbaum · December 23, 2021, 11:08pm

Yes that’s all fair. It’s not a performance-critical bit of code for me at the moment so I wasn’t bothering. I’m just using it to split up some items before calling a function on the groups within a @threads loop.

takbal · December 28, 2022, 3:00pm

This approach seems to be not the best for divisioning data for parallel execution. Consider

julia> makechunks(collect(1:11), 4)
4-element Vector{SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}}:
 [1, 2]
 [3, 4]
 [5, 6]
 [7, 8, 9, 10, 11]

Splitting data like this is going to bottleneck on the last one that contains the remainder, therefore, at worst the entire processing runs potentlally close to twice as slow as it could be.

I believe this is better:

function equal_partition(n::Int64, parts::Int64)
    if n < parts
        return [ x:x for x in 1:n ]
    end
    starts = push!(Int64.(round.(1:n/parts:n)), n+1)
    return [ starts[i]:starts[i+1]-1 for i in 1:length(starts)-1 ]
end

function equal_partition(V::AbstractVector, parts::Int64)
    ranges = equal_partition(length(V), parts)
    return [ view(V,range) for range in ranges ]
end

julia> equal_partition(collect(1:11), 4)
4-element Vector{SubArray{Int64, 1, Vector{Int64}, Tuple{UnitRange{Int64}}, true}}:
 [1, 2, 3]
 [4, 5]
 [6, 7, 8]
 [9, 10, 11]

Edited to work better if n < parts (in which case it returns a smaller array).

rocco_sprmnt21 · December 28, 2022, 4:02pm

certainly not champions of efficiency, but just to play with Julia’s possibilities


parts(n,p,pts=Int[])= cld(n,p)==n/p ? (return push!(pts,fill(cld(n,p),p)...)) : parts(n-cld(n,p),p-1,push!(pts,cld(n,p)))


prng(n,p)=accumulate((s,c)->s.stop+1 : s.stop+c , parts(n,p);init=1:0)


v=rand(1:10,13)
getindex.(Ref(v),prng(length(v),4))
julia> getindex.(Ref(v),prng(length(v),4))
4-element Vector{Vector{Int64}}:
 [10, 5, 3, 8]
 [10, 1, 2]
 [4, 1, 3]
 [2, 6, 9]

julia> v=rand(1:10,10)
julia> getindex.(Ref(v),prng(length(v),4))
4-element Vector{Vector{Int64}}:
 [1, 6, 6]
 [3, 3, 2]
 [6, 6]
 [7, 9]

PS
This should ensure that the maximum difference between the subvector sizes is 1.

Ahmed_Salih · December 28, 2022, 5:52pm

I was recently introduced to this package:

Just to spread it a bit, seems like it is what people in this thread wants to do

Kind regards

rocco_sprmnt21 · December 28, 2022, 6:31pm

This simple formula appears to give a scattered partition of the indexes

f(n,k)= [collect(i:k:n) for i in 1:k]

rocco_sprmnt21 · March 3, 2023, 9:58am

using parts() and the partby iterator as defined here

you could get

parts(n,p)= (np=fill(div(n,p),p); np[1:rem(n,p)].+=1; np)
collect(partby(itr,parts(length(itr),4)))

Topic		Replies	Views
Partitioning a vector General Usage	7	349	January 31, 2024
Is there a simple/intuitive way to partition a matrix by arbitrary strides? Like i General Usage matrices	9	1168	February 24, 2021
Tools for partitioning matrix columns? General Usage question	3	403	September 21, 2023
Get suffix from Vector of Vectors New to Julia	2	288	March 19, 2021
Breaking a list into chunks of size N General Usage array	2	5143	December 25, 2019

Split vector into N (potentially unequal length) subvectors?

Related topics