Shuffle columns of a matrix

Tamas_Papp · June 30, 2025, 1:21pm

I want to shuffle (randomly rearrange) the columns of a large matrix. As much in place as possible, as the matrix is large; otherwise a naive

M[:, randperm(size(M, 2))]

would do.

I thought of modifying Random.shuffle!, but I don’t understand what mask does in the code. If I instantiate a random permutation, I can do

using Random

function shuffle_columns1!(M::AbstractMatrix{T}) where T
    n, m = size(M)
    buffer = Vector{T}(undef, n)
    p = randperm(m)
    for (i, j) in enumerate(p)
        buffer .= M[:, i]
        M[:, i] .= M[:, j]
        M[:, j] .= buffer
    end
    M
end

eg

julia> M = mapreduce(_ -> permutedims(1:10), vcat, 1:5)
5×10 Matrix{Int64}:
 1  2  3  4  5  6  7  8  9  10
 1  2  3  4  5  6  7  8  9  10
 1  2  3  4  5  6  7  8  9  10
 1  2  3  4  5  6  7  8  9  10
 1  2  3  4  5  6  7  8  9  10

julia> shuffle_columns1!(M)
5×10 Matrix{Int64}:
 2  3  5  6  4  1  10  9  7  8
 2  3  5  6  4  1  10  9  7  8
 2  3  5  6  4  1  10  9  7  8
 2  3  5  6  4  1  10  9  7  8
 2  3  5  6  4  1  10  9  7  8

but I would rather avoid allocating the randperm, too. Is there a package which has functionality for something like this?

gdalle · June 30, 2025, 1:30pm

Would randperm! solve your issue?

karei · June 30, 2025, 1:40pm

p = randperm(size(M,2))
Base.permutecols!!(M, randperm!(p))

Tamas_Papp · June 30, 2025, 1:45pm

I don’t think so, it shuffles all elements, not just columns.

Needless to say, I don’t want to use internal bindings.

gdalle · June 30, 2025, 1:47pm

I meant to avoid allocating randperm inside your shuffle_columns1!?

mbauman · June 30, 2025, 1:49pm

Sure, but the code here is self-contained and something you could just copy directly into your own source (with attribution).

github.com/JuliaLang/julia

base/combinatorics.jl

8caacdb69


      
          # swap columns i and j of a, in-place
          function swapcols!(a::AbstractMatrix, i, j)
              i == j && return
              cols = axes(a,2)
              @boundscheck i in cols || throw(BoundsError(a, (:,i)))
              @boundscheck j in cols || throw(BoundsError(a, (:,j)))
              for k in axes(a,1)
                  @inbounds a[k,i],a[k,j] = a[k,j],a[k,i]
              end
          end
          
          # swap rows i and j of a, in-place
          function swaprows!(a::AbstractMatrix, i, j)
              i == j && return
              rows = axes(a,1)
              @boundscheck i in rows || throw(BoundsError(a, (:,i)))
              @boundscheck j in rows || throw(BoundsError(a, (:,j)))
              for k in axes(a,2)
                  @inbounds a[i,k],a[j,k] = a[j,k],a[i,k]
              end

This file has been truncated. show original

Tamas_Papp · June 30, 2025, 2:05pm

Certainly, if as a last resort that can be done; but as I said above, I can also modify shuffle!.

One just needs to generalize the operation of “exchange i for j”, eg

# a generalization of `shuffle!` adopted from Random, exchange!(i, j) swaps
# whatever the user wants
function do_shuffle(exchange!, r::AbstractRNG, n)
    # keep it consistent with `randperm!` and `randcycle!` if possible
    @assert n <= Int64(2)^52
    n == 0 && return
    mask = 3
    @inbounds for i = 2:n
        j = 1 + rand(r, Random.ltm52(i, mask))
        exchange!(i, j)
        i == 1 + mask && (mask = 2 * mask + 1)
    end
end

mbauman · June 30, 2025, 2:19pm

Ah, sure, looks like passing a curried swapcols! as the exchange! function arg would do the ticket.

foobar_lv2 · June 30, 2025, 4:36pm

I think this is a leftover from ancient times where people used RNGs with non-native word-size (ie mersenne twister, optimized for generating Float64). Back then, people used to do all kinds of misguided counter-productive contortions in order to save some cycles.

I’m saying misguided because our shuffle! fisher-yates ends up using naive rejection-sampling instead of the nice new fancy multiplication-based technique to generate numbers in a range. Lo and behold

julia> function baseshuffle!(r::Random.AbstractRNG, a::AbstractArray)
           # keep it consistent with `randperm!` and `randcycle!` if possible
           Base.require_one_based_indexing(a)
           n = length(a)
           @assert n <= Int64(2)^52
           n == 0 && return a
           mask = 3
           @inbounds for i = 2:n
               j = 1 + rand(r, Random.ltm52(i, mask))
               a[i], a[j] = a[j], a[i]
               i == 1 + mask && (mask = 2 * mask + 1)
           end
           return a
       end

julia> function simpleshuffle!(r::Random.AbstractRNG, a::AbstractArray)
           # keep it consistent with `randperm!` and `randcycle!` if possible
           Base.require_one_based_indexing(a)
           n = length(a)
           n == 0 && return a
           @inbounds for i = 2:n
               j = rand(r, 1:i)
               a[i], a[j] = a[j], a[i]
           end
           return a
       end

julia> r=Random.default_rng(); b = collect(1:10_000); @btime Random.shuffle!(b); @btime baseshuffle!(r, b); @btime simpleshuffle!(r,b);
  53.749 μs (0 allocations: 0 bytes)
  53.483 μs (0 allocations: 0 bytes)
  18.988 μs (0 allocations: 0 bytes)

We should simplify that code.

(not throwing shade on the julia devs, instead throwing shade on what passed for scientific consensus on randomness in the late 90s. Also hardware-dependent – the tradeoff would look different on a late 90s CPU with shallow pipeline and slow multiplication)

Topic		Replies	Views
Shuffle values and sample indices of a sparse matrix Performance question , python , sparse	15	1301	March 23, 2021
Permuting the rows of a matrix based on given a given permutation General Usage	13	3920	May 22, 2025
Swap cols/rows of a matrix General Usage matrix	15	7286	January 5, 2024
Permute Multidimensional Array without allocation General Usage	9	2230	September 25, 2019
Best way to shuffle elements in a vector of vectors General Usage	11	8124	November 9, 2018

Shuffle columns of a matrix

Related topics