Want to generate samples with all items being sampled the same number of times?

Is there any library that can help me generate random “batches” of data such that every item appears in the same number of batches (with the exception that some items may appear one more time than other items)?

is it the same as:

julia> a = [1,2,3]
3-element Vector{Int64}:
 1
 2
 3

julia> bag = repeat(a, 2) #every item appears twice

julia> shuffle!(bag)

I am not sure I understand the specs, but I don’t think there is a specialized function for this (especially given the exception).

If you can post a small example of what the output should look like I’m certain you’ll get a good answer to this.

Wrote my own:

struct Batcher{T}
    data :: Vector{T}
    batchsize :: Int
    Batcher(data; batchsize) = new{eltype(data)}(data, batchsize)
end

function Base.iterate(b::Batcher)
    shuffle!(b.data)
    iterate(b, 1)
end

function Base.iterate(b::Batcher, state)
    if b.batchsize > length(b.data)
        return (shuffle!(b.data), 1)
    end
    if state + b.batchsize-1 > length(b.data)
        post_shuffle_size = length(b.data) - state + 1
        result = b.data[state:end]
        shuffle!(b.data)
        (append!(result, b.data[1:post_shuffle_size]), post_shuffle_size+1)
    else
        (b.data[state:state+b.batchsize-1], (state + b.batchsize) % length(b.data))
    end
end
1 Like