Best way to shuffle elements in a vector of vectors


#1

I have some code that needs to shuffle up the elements of a vector of vectors the way I have this coded feels terrible and I was hoping some advice could come on how to make this better. The key is that the data is stored as a vector of “rows” all of equal size, but I want to shuffle the “columns” of this but return back the original structure of vector of vectors.

data = [1:5 for i in 1:20]
stacked = hcat(data...)
hcat([shuffle(stacked[:, i]) for i in 1:size(stacked, 2)]...)

Thanks so much for any tips!


#2

Just one idea based on RecursiveArrayTools.jl:

using RecursiveArrayTools

function myshuffle!(data)
  v = VectorOfArray(data)
  for k in 1:length(v)
    shuffle!(v[:,k])
  end
  nothing
end

which would give

julia> data = [collect(1:5) for i in 1:20];

julia> myshuffle!(data);

julia> data
20-element Array{Array{Int64,1},1}:
 [1, 5, 4, 3, 2]
 [3, 4, 2, 1, 5]
 [3, 2, 5, 4, 1]
 [4, 3, 5, 1, 2]
 [5, 1, 4, 3, 2]
 [3, 2, 5, 4, 1]
 [1, 4, 3, 2, 5]
 [1, 2, 4, 5, 3]
 [1, 3, 2, 4, 5]
 [4, 3, 1, 5, 2]
 [5, 2, 4, 1, 3]
 [4, 2, 1, 5, 3]
 [3, 5, 4, 2, 1]
 [5, 2, 1, 3, 4]
 [3, 4, 1, 2, 5]
 [4, 3, 2, 5, 1]
 [2, 1, 5, 4, 3]
 [4, 2, 3, 1, 5]
 [4, 3, 5, 2, 1]
 [2, 3, 4, 1, 5]

This should all be lazy/in-place.

UPDATE:

julia> @btime myshuffle!($data);
  1.536 μs (0 allocations: 0 bytes)

#4

If vector-of-vectors is the structure you want then you can do this:

julia> using Random

julia> data = [collect(1:5) for i in 1:10];

julia> foreach(shuffle!, data)

julia> data
10-element Array{Array{Int64,1},1}:
 [3, 2, 5, 1, 4]
 [1, 2, 4, 5, 3]
 [2, 3, 1, 5, 4]
 [2, 4, 5, 3, 1]
 [2, 3, 4, 5, 1]
 [3, 1, 4, 5, 2]
 [1, 4, 5, 3, 2]
 [5, 3, 1, 4, 2]
 [4, 5, 3, 2, 1]
 [3, 2, 4, 5, 1]

If you’d prefer to have a matrix with shuffled rows or columns, you can use views:

julia> data = [j for i = 1:10, j = 1:5];

julia> for i = 1:size(data, 1)
           shuffle!(@view data[i,:])
       end

julia> data
10×5 Array{Int64,2}:
 2  5  3  4  1
 4  2  1  5  3
 4  5  3  2  1
 1  4  2  3  5
 5  2  1  3  4
 1  4  3  2  5
 4  3  5  2  1
 2  4  3  5  1
 3  1  2  5  4
 3  1  4  2  5

#5

I just realize that using VectorOfArrays is completely unnecessary… In my defense, it’s Friday evening for me :smiley:


#6

Is this not just shuffling each of the subvectors (i.e. the so called rows)? The issue with what I am doing is that I need to shuffle along the “columns” of the stacked data so I need to have a shuffling on data[i][j] for each j hence all the horrible hcat(data…) stuff.


#7

So you want this?

julia> using Random

julia> data = [collect(i:i+4) for i in 1:5:50];

julia> data
10-element Array{Array{Int64,1},1}:
 [1, 2, 3, 4, 5]     
 [6, 7, 8, 9, 10]    
 [11, 12, 13, 14, 15]
 [16, 17, 18, 19, 20]
 [21, 22, 23, 24, 25]
 [26, 27, 28, 29, 30]
 [31, 32, 33, 34, 35]
 [36, 37, 38, 39, 40]
 [41, 42, 43, 44, 45]
 [46, 47, 48, 49, 50]

julia> front_=[data[i][1] for i=1:length(data)];

julia> shuffle!(front_);

julia> for i=1:length(data) data[i][1] = front_[i] end;

julia> data
10-element Array{Array{Int64,1},1}:
 [6, 2, 3, 4, 5]     
 [46, 7, 8, 9, 10]   
 [36, 12, 13, 14, 15]
 [11, 17, 18, 19, 20]
 [41, 22, 23, 24, 25]
 [31, 27, 28, 29, 30]
 [16, 32, 33, 34, 35]
 [26, 37, 38, 39, 40]
 [21, 42, 43, 44, 45]
 [1, 47, 48, 49, 50] 


#8

I see my test code had a bug needed to be with shuffle(stacked[:, i]), will fix up the question, and so far the recursive arrays are nice (should have known … as I implemented some of that for just this reason …)


#9

yes this kind of thing but for all of the “columns” Much better example … as my [1:5 for i = 1:20] is really dumb in this case as I it would be unchanged for what I want as each column is equal. Always a trick to properly test your minimum example from a much larger code base!


#10

Okay with everyone’s comments here is the new best solutions I can do (that is actually correct unlike my first example …).

using RecursiveArrayTools
using BenchmarkTools

function shuffle_col!(data)
    vdata = VectorOfArray(data)
    for i = 1:size(vdata, 1)
        shuffle!(@view vdata[i, :])
    end
    return
end

data = [fill(i, 5) for i in 1:10]
@btime shuffle_col!(data)

Anyone see anything I could do to make it faster / cleaner? Thanks so much!

Also is it the correct style to return nothing from an inplace operation?


#11

I prefer the explict: return nothing. Nothing is an appropriate return from an inplace operation, sometimes there is reason to return the changed thing explicitly, too.


#12

Instead of VectorOfArray(data) cannot you just create a matrix?

[data[i][j] for i=1:length(data),j=1:length(data[1])]

After shuffling, you can convert it back to vector of vectors:

[vdata[i,:] for i=1:size(vdata)[1]]


#13

To make it for columns instead of rows in Stefan’s second example, just use [:,i]. Also, use $ before variables with @btime to get proper timings.