# Draw two distinct random integers from `1:n`

I’m looking for a fast and elegant way to repeatedly draw two distinct random integers from the range `1:n`.

A naive baseline:

``````julia> function draw(n, iter)
for i in 1:iter
a, b = rand(1:n), rand(1:n)
while b == a
b = rand(1:n)
end
end
return nothing
end
draw (generic function with 1 method)

julia> @btime draw(10_000, 1000);
6.857 μs (0 allocations: 0 bytes)
``````

Preferably, I’d want a solution that doesn’t use external packages (since eventually I want to use this in a package and don’t want to add extra dependencies). But depending on elegance and speed I might reconsider Let the games begin (and thanks in advance!)

With `n = 10_000` you hit that branch almost never, so this is probably optimal.

3 Likes

If `n` is large, thus the probability of coincidence is very small, probably one cannot improve too much what you have. For small `n` I came up with this (though I’m not sure if the distributions don’t become correlated somehow):

``````julia> function draw2(n)
a = rand(1:n)
b = rand(1:n-1)
if a >= b
b += 1
end
a, b
end
``````

(thinking a bit further, I think this option is fine, the second draw is a draw on the remaining possible “positions”, which are univocally assigned to the corresponding values, thus that is probably as correlated as the first option, unless I’m missing some subtlety - as shown bellow, clearly I am )

1 Like
``````julia> using UnicodePlots

julia> function draw2(n)
a = rand(1:n)
b = rand(1:n-1)
if a >= b
b += 1
end
a, b
end
draw2 (generic function with 1 method)

julia> histogram(reinterpret(Int,[draw2(3) for i = 1:100000]), nbins=3)
┌                                        ┐
[1.0, 2.0) ┤███████████▏ 33329
[2.0, 3.0) ┤█████████████████████████████████  99623
[3.0, 4.0) ┤██████████████████████▎ 67048
└                                        ┘
Frequency
``````
5 Likes

You’re drawing two numbers in `1:n`, so basically drawing two coordinates for a square with side n. There are n*n possible combinations. You exclude pairs with equal numbers, so you’re basically excluding the diagonal. We’re in integers, there are n values on the diagonal of a total of n^2. So we have n² - n valid values with equal distribution. As such, the chance for actually hitting the diagonal is n/n² = 1/n. We can check this - for n=2, we exclude 2 elements on the diagonal out of a total n² = 4. For n=3, we exclude 3 of 9, meaning our chance was 1/3. n=44/161/4 etc. This is the same chance as we get for having to draw any of the two a second time (since we have to draw at least one more time only if we hit the diagonal). As n grows, this quickly becomes very small. I’d be very surprised if any method would be faster than just picking again (though of course you can optimize this for e.g. n=2 directly by returning the only other option left) without acidentally biasing/correlating the distribution of the second number to with the distribution of the first more than necessary (i.e., more than only having `n-1` possible values with equal chance).

You’re biasing `b` to be larger when `a` is larger than `b`, meaning the distribution of `b` is correlated to the distribution of `a`.

6 Likes

Almost ``````function draw2_(n)
a = rand(1:n)
b = rand(1:n-1)
b += (b >= a)
return a, b
end

1.7.0-rc3> histogram(reinterpret(Int,[draw2_(3) for i = 1:100000]), nbins=3)
┌                                        ┐
[1.0, 2.0) ┤█████████████████████████████████  66720
[2.0, 3.0) ┤████████████████████████████████▉ 66607
[3.0, 4.0) ┤████████████████████████████████▉ 66673
└                                        ┘
Frequency
``````
6 Likes

Gotta love those UnicodePlots.jl-powered histograms 4 Likes

In case anyone doesn’t catch the subtlety: the code by @DNF does `b += (b >= a)`, while the code by @lmiq does `b += (a >= b)`. The former correctly shifts only the distribution of values of `b` greater than `a` upwards by one, avoiding `a`, while the latter incorrectly shifts all values lower than `a`, meaning another value one less than `b` can potentially hit `a` again.

4 Likes

It’s probably also useful to use explicit `rng`:

``````using Random
function draw2_(n, rng=MersenneTwister())
a = rand(rng, 1:n)
b = rand(rng, 1:n-1)
b += (b >= a)
return a, b
end

1.7.0-rc3> rng = MersenneTwister();

1.7.0-rc3> @btime draw2_(10, \$rng)
9.400 ns (0 allocations: 0 bytes)
(5, 10)
``````

Edit: Oops, pasted wrong function definition.

I much prefer `Xoshiro` ``````julia> @benchmark draw2(10_000, rng) setup=(rng=Xoshiro())
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
Range (min … max):  7.970 ns … 33.074 ns  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     8.002 ns              ┊ GC (median):    0.00%
Time  (mean ± σ):   8.575 ns ±  1.485 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

█  ▃ ▁▃  ▂▂  ▄   ▁   ▆▃                                    ▁
█▅▃████▅▄██▅▆█▄▄▄█▅▄▄██▆▄▄▆▆▇▆▆▁▃▇▆▅▆▅▅▄▃▄▄▃▁▁▅▅▃▆▄▅▄▄▄▄▃▅ █
7.97 ns      Histogram: log(frequency) by time     12.6 ns <

Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark draw2(10_000, rng) setup=(rng=MersenneTwister())
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
Range (min … max):  11.104 ns … 49.395 ns  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     11.607 ns              ┊ GC (median):    0.00%
Time  (mean ± σ):   12.770 ns ±  2.931 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

██▇▅▄▃▂▃▂▂▄▃▂▂▁▂▂▃▂▂▃▁▁▂▂▂▁▁ ▁                              ▂
███████████████████████████████▇█▇▅▅▅▅▅▄▄▃▃▃▄▁▁▃▁▄▅▆▆▆▅▆▆▇▇ █
11.1 ns      Histogram: log(frequency) by time      26.1 ns <

Memory estimate: 17 bytes, allocs estimate: 0.
``````

Brilliant and very cool. My interpretation was correct, the implementation was wrong 1 Like

Oh, yes, I knew there was a better one, but forgot which it was.

`Xoshiro` will be the default in the (very soon) to be released 1.7 Still, from this matricial picture, it should be possible to call `rand(1:n^2-n)` and do an index transformation to obtain the two numbers. I don’t know if in any case the cost of those transformations pay off relative to a call to `rand()`.

You’re more or less asking about converting a linear index to a cartesian one - and if I remember correctly, that involves a division + modulo, which is surely more expensive than a single `rand` and an addition. Additionally, simply excluding `n` values from the range is not sufficient - it matters which values you exclude.

1 Like

Yes, that. It is not completely evident that those index transformations are more expensive than a call to `rand()` though. (unfortunately to get the indexes right I would spend all day…). But a simple guess suggests that it might be something of the same order:

``````julia> function iter(n,draw)
for i in 1:1000
draw(n)
end
nothing
end
iter (generic function with 1 method)

julia> function draw2(n)
a = rand(1:n^2 - n)
b = mod(a,n) + div(n^3,n) # <--- some random stuff
if b >= a
b += 1
end
a, b
end
draw2 (generic function with 1 method)

julia> @btime iter(3,\$draw2)
13.623 μs (0 allocations: 0 bytes)

julia> function draw3(n)
a = rand(1:n)
b = rand(1:n-1)
if b >= a
b += 1
end
a, b
end
draw2 (generic function with 1 method)

julia> @btime iter(3,\$draw3)
14.473 μs (0 allocations: 0 bytes)

``````

If you provide the rng, it looks to me like `divrem` (which should be more or less the best case here) is definitely more expensive than `rand`:

``````function draw2_(n, rng=MersenneTwister())
a = rand(rng, 1:n)
b = rand(rng, 1:n-1)
b += (b >= a)
return a, b
end

function draw1(n, rng=Xoshiro())
ab = rand(rng, 0:(n^2-n-1))
(a, b) = divrem(ab, n)  # best case tranformation
return a, b  # wrong answer!!
end

1.7.0-rc3> @btime draw2_(10_000, rng) setup=(rng=Xoshiro())
8.000 ns (0 allocations: 0 bytes)
(7278, 3967)

1.7.0-rc3> @btime draw1(10_000, rng) setup=(rng=Xoshiro())
12.412 ns (0 allocations: 0 bytes)
(2233, 4742)
``````

Without the `rng` it is practically a ‘draw’:

``````function draw2b(n)
a = rand(1:n)
b = rand(1:n-1)
b += (b >= a)
return a, b
end

1.7.0-rc3> @btime draw2b(10_000)
12.412 ns (0 allocations: 0 bytes)
(2873, 7609)
``````
1 Like

Here is a working one (it is more expensive - but I need to call `rand(Bool)` once, I don’t know if that is less expensive than a `rand(Int)`, if not it does not make sense):

``````# get cartesian coordinates from linear index of upper triangular
julia> @inline function iuppert(k::Integer,n::Integer)
i = n - 1 - floor(Int,sqrt(-8*k + 4*n*(n-1) + 1)/2 - 0.5)
j = k + i + ( (n-i+1)*(n-i) - n*(n-1) )÷2
return i, j
end
iuppert (generic function with 1 method)

julia> function draw4(n)
ind = rand(1:(n^2-n)÷2)
a, b = iuppert(ind,n)
if rand(Bool) # need to allow lower triangular as well
a, b = b, a
end
a, b
end
draw4 (generic function with 1 method)

# seems correct:
julia> histogram(reinterpret(Int,[draw4(3) for i = 1:100000]), nbins=3)
┌                                        ┐
[1.0, 2.0) ┤████████████████████████████████▌ 33053
[2.0, 3.0) ┤█████████████████████████████████  33503
[3.0, 4.0) ┤████████████████████████████████▉ 33444

# but slower
julia> function iter(n,draw)
for i in 1:1000
draw(n)
end
nothing
end
iter (generic function with 1 method)

julia> @btime iter(3,\$draw4)
19.474 μs (0 allocations: 0 bytes)

julia> function draw2(n)
a = rand(1:n)
b = rand(1:n-1)
if b >= a
b += 1
end
a, b
end
draw2 (generic function with 1 method)

julia> @btime iter(3,\$draw2)
14.576 μs (0 allocations: 0 bytes)

``````

Much as I love this relation after having derived it in a course ages ago, it is really just making a relatively simple problem way more complicated here. Try this implementation instead:

``````function draw5(n)
ind = rand(0:(n * (n - 1) - 1))
a, b = divrem(ind, n)
a += (a >= b)
return a + 1, b + 1
end
``````
1 Like

Great! And that is faster than calling `rand()` twice:

``````julia> function iter(n,draw)
for i in 1:1000
draw(n)
end
nothing
end
iter (generic function with 1 method)

julia> function draw5(n)
ind = rand(0:(n * (n - 1) - 1))
a, b = divrem(ind, n)
a += (a >= b)
return a + 1, b + 1
end
draw5 (generic function with 1 method)

julia> @btime iter(3,\$draw5)
11.083 μs (0 allocations: 0 bytes)

julia> using UnicodePlots

julia> histogram(reinterpret(Int,[draw5(3) for i = 1:100000]), nbins=3)
┌                                        ┐
[1.0, 2.0) ┤████████████████████████████████▊ 66522
[2.0, 3.0) ┤████████████████████████████████▊ 66651
[3.0, 4.0) ┤█████████████████████████████████  66827
└                                        ┘
Frequency

``````
1 Like