Shuffling multiple N-arrays over a given dimension

sylvaticus · April 13, 2021, 2:44pm

Sometimes we have the need to shuffle a given list of arrays over a given dimension, but keeping the relative order on that dimension ( e.g. when we have in machine learning to shuffle features and label matrices over the dimension concerning the observations).
I hence developed the following function that does that. Do you have comments / ideas on how to improve it ?

import Random.shuffle
"""
    shuffle(data;dims,rng)

Shuffle a vector of n-dimensional arrays across dimension `dims` keeping the same order between the arrays

# Parameters
- `data`: The vector of arrays to shuffle
- `dims`: The dimension over to apply the shuffle [def: `1`]
- `rng`:  An `AbstractRNG` to apply for the shuffle

# Notes
- All the arrays must have the same size for the dimension to shuffle

# Example

julia> a = [1 2 30; 10 20 30]; b = [100 200 300];
julia> (aShuffled, bShuffled) = shuffle([a,b],dims=2)
2-element Vector{Matrix{Int64}}:
 [1 30 2; 10 30 20]
 [100 300 200]
 
"""
function shuffle(data::AbstractArray{T,1};dims=1,rng=Random.GLOBAL_RNG)  where T <: AbstractArray
    Ns = [size(m,dims) for m in data]
    length(Set(Ns)) == 1 || @error "In `shuffle(arrays)` all individual arrays need to have the same size on the dimension specified"
    N    = Ns[1]
    ridx = Random.shuffle(rng, 1:N)
    out = similar(data)
    for (i,a) in enumerate(data)
       aidx = [collect(1:i) for i in size(a)]
       aidx[dims] = ridx
       out[i] = a[aidx...]
    end
    return out
end
shuffle(rng::AbstractRNG,data::AbstractArray{T,1};dims=1) where T <: AbstractArray = shuffle(data;dims=dims,rng=rng)

sylvaticus · October 27, 2022, 4:24am

I now see that this is indeed type piracy as Random.shuffle works also on arrays of arrays, with reordening the individual arrays.
I suppose I should then change its name… suggestions ? “consistentshuffle” ? but it’s long…

rafael.guerra · October 27, 2022, 8:19am

If I am not mistaken, mapslices() and broadcasting could greatly simplify this task:

a = [1 2 30; 10 20 30]; b = [100 200 300]
Random.seed!(0)
dim = 2
ix = randperm(size(a,dim))
mapslices.(x->x[ix], [a,b], dims=dim)


# result:
2-element Vector{Matrix{Int64}}:
 [2 1 30; 20 10 30]
 [200 100 300]

Topic		Replies	Views
Permute Multidimensional Array without allocation General Usage	9	2210	September 25, 2019
Appending to an array along dimension n General Usage question , array , arrays	2	375	December 29, 2020
Constructing multi-dimensional array from existing ones New to Julia	12	752	January 20, 2020
Sum over arrays - What am I doing wrong? General Usage question , broadcast , array	6	506	February 24, 2021
How do I create an array with random unique numbers in a specific range? New to Julia question , random	20	1345	June 24, 2024

Shuffling multiple N-arrays over a given dimension

Related topics