Shuffle columns of a matrix

I want to shuffle (randomly rearrange) the columns of a large matrix. As much in place as possible, as the matrix is large; otherwise a naive

M[:, randperm(size(M, 2))]

would do.

I thought of modifying Random.shuffle!, but I don’t understand what mask does in the code. If I instantiate a random permutation, I can do

using Random

function shuffle_columns1!(M::AbstractMatrix{T}) where T
    n, m = size(M)
    buffer = Vector{T}(undef, n)
    p = randperm(m)
    for (i, j) in enumerate(p)
        buffer .= M[:, i]
        M[:, i] .= M[:, j]
        M[:, j] .= buffer
    end
    M
end

eg

julia> M = mapreduce(_ -> permutedims(1:10), vcat, 1:5)
5×10 Matrix{Int64}:
 1  2  3  4  5  6  7  8  9  10
 1  2  3  4  5  6  7  8  9  10
 1  2  3  4  5  6  7  8  9  10
 1  2  3  4  5  6  7  8  9  10
 1  2  3  4  5  6  7  8  9  10

julia> shuffle_columns1!(M)
5×10 Matrix{Int64}:
 2  3  5  6  4  1  10  9  7  8
 2  3  5  6  4  1  10  9  7  8
 2  3  5  6  4  1  10  9  7  8
 2  3  5  6  4  1  10  9  7  8
 2  3  5  6  4  1  10  9  7  8

but I would rather avoid allocating the randperm, too. Is there a package which has functionality for something like this?

1 Like

Would randperm! solve your issue?

p = randperm(size(M,2))
Base.permutecols!!(M, randperm!(p))

:wink:

1 Like

I don’t think so, it shuffles all elements, not just columns.

Needless to say, I don’t want to use internal bindings.

I meant to avoid allocating randperm inside your shuffle_columns1!?

1 Like

Sure, but the code here is self-contained and something you could just copy directly into your own source (with attribution).

3 Likes

Certainly, if as a last resort that can be done; but as I said above, I can also modify shuffle!.

One just needs to generalize the operation of “exchange i for j”, eg

# a generalization of `shuffle!` adopted from Random, exchange!(i, j) swaps
# whatever the user wants
function do_shuffle(exchange!, r::AbstractRNG, n)
    # keep it consistent with `randperm!` and `randcycle!` if possible
    @assert n <= Int64(2)^52
    n == 0 && return
    mask = 3
    @inbounds for i = 2:n
        j = 1 + rand(r, Random.ltm52(i, mask))
        exchange!(i, j)
        i == 1 + mask && (mask = 2 * mask + 1)
    end
end

Ah, sure, looks like passing a curried swapcols! as the exchange! function arg would do the ticket.

1 Like

I think this is a leftover from ancient times where people used RNGs with non-native word-size (ie mersenne twister, optimized for generating Float64). Back then, people used to do all kinds of misguided counter-productive contortions in order to save some cycles.

I’m saying misguided because our shuffle! fisher-yates ends up using naive rejection-sampling instead of the nice new fancy multiplication-based technique to generate numbers in a range. Lo and behold

julia> function baseshuffle!(r::Random.AbstractRNG, a::AbstractArray)
           # keep it consistent with `randperm!` and `randcycle!` if possible
           Base.require_one_based_indexing(a)
           n = length(a)
           @assert n <= Int64(2)^52
           n == 0 && return a
           mask = 3
           @inbounds for i = 2:n
               j = 1 + rand(r, Random.ltm52(i, mask))
               a[i], a[j] = a[j], a[i]
               i == 1 + mask && (mask = 2 * mask + 1)
           end
           return a
       end

julia> function simpleshuffle!(r::Random.AbstractRNG, a::AbstractArray)
           # keep it consistent with `randperm!` and `randcycle!` if possible
           Base.require_one_based_indexing(a)
           n = length(a)
           n == 0 && return a
           @inbounds for i = 2:n
               j = rand(r, 1:i)
               a[i], a[j] = a[j], a[i]
           end
           return a
       end

julia> r=Random.default_rng(); b = collect(1:10_000); @btime Random.shuffle!(b); @btime baseshuffle!(r, b); @btime simpleshuffle!(r,b);
  53.749 μs (0 allocations: 0 bytes)
  53.483 μs (0 allocations: 0 bytes)
  18.988 μs (0 allocations: 0 bytes)

We should simplify that code.

(not throwing shade on the julia devs, instead throwing shade on what passed for scientific consensus on randomness in the late 90s. Also hardware-dependent – the tradeoff would look different on a late 90s CPU with shallow pipeline and slow multiplication)

5 Likes