I want to shuffle (randomly rearrange) the columns of a large matrix. As much in place as possible, as the matrix is large; otherwise a naive
M[:, randperm(size(M, 2))]
would do.
I thought of modifying Random.shuffle!, but I don’t understand what mask does in the code. If I instantiate a random permutation, I can do
using Random
function shuffle_columns1!(M::AbstractMatrix{T}) where T
n, m = size(M)
buffer = Vector{T}(undef, n)
p = randperm(m)
for (i, j) in enumerate(p)
buffer .= M[:, i]
M[:, i] .= M[:, j]
M[:, j] .= buffer
end
M
end
Certainly, if as a last resort that can be done; but as I said above, I can also modify shuffle!.
One just needs to generalize the operation of “exchange i for j”, eg
# a generalization of `shuffle!` adopted from Random, exchange!(i, j) swaps
# whatever the user wants
function do_shuffle(exchange!, r::AbstractRNG, n)
# keep it consistent with `randperm!` and `randcycle!` if possible
@assert n <= Int64(2)^52
n == 0 && return
mask = 3
@inbounds for i = 2:n
j = 1 + rand(r, Random.ltm52(i, mask))
exchange!(i, j)
i == 1 + mask && (mask = 2 * mask + 1)
end
end
I think this is a leftover from ancient times where people used RNGs with non-native word-size (ie mersenne twister, optimized for generating Float64). Back then, people used to do all kinds of misguided counter-productive contortions in order to save some cycles.
I’m saying misguided because our shuffle! fisher-yates ends up using naive rejection-sampling instead of the nice new fancy multiplication-based technique to generate numbers in a range. Lo and behold
julia> function baseshuffle!(r::Random.AbstractRNG, a::AbstractArray)
# keep it consistent with `randperm!` and `randcycle!` if possible
Base.require_one_based_indexing(a)
n = length(a)
@assert n <= Int64(2)^52
n == 0 && return a
mask = 3
@inbounds for i = 2:n
j = 1 + rand(r, Random.ltm52(i, mask))
a[i], a[j] = a[j], a[i]
i == 1 + mask && (mask = 2 * mask + 1)
end
return a
end
julia> function simpleshuffle!(r::Random.AbstractRNG, a::AbstractArray)
# keep it consistent with `randperm!` and `randcycle!` if possible
Base.require_one_based_indexing(a)
n = length(a)
n == 0 && return a
@inbounds for i = 2:n
j = rand(r, 1:i)
a[i], a[j] = a[j], a[i]
end
return a
end
julia> r=Random.default_rng(); b = collect(1:10_000); @btime Random.shuffle!(b); @btime baseshuffle!(r, b); @btime simpleshuffle!(r,b);
53.749 μs (0 allocations: 0 bytes)
53.483 μs (0 allocations: 0 bytes)
18.988 μs (0 allocations: 0 bytes)
We should simplify that code.
(not throwing shade on the julia devs, instead throwing shade on what passed for scientific consensus on randomness in the late 90s. Also hardware-dependent – the tradeoff would look different on a late 90s CPU with shallow pipeline and slow multiplication)