# Draw two distinct random integers from `1:n`

Iβm looking for a fast and elegant way to repeatedly draw two distinct random integers from the range `1:n`.

A naive baseline:

``````julia> function draw(n, iter)
for i in 1:iter
a, b = rand(1:n), rand(1:n)
while b == a
b = rand(1:n)
end
end
return nothing
end
draw (generic function with 1 method)

julia> @btime draw(10_000, 1000);
6.857 ΞΌs (0 allocations: 0 bytes)
``````

Preferably, Iβd want a solution that doesnβt use external packages (since eventually I want to use this in a package and donβt want to add extra dependencies). But depending on elegance and speed I might reconsider

Let the games begin (and thanks in advance!)

With `n = 10_000` you hit that branch almost never, so this is probably optimal.

3 Likes

If `n` is large, thus the probability of coincidence is very small, probably one cannot improve too much what you have. For small `n` I came up with this (though Iβm not sure if the distributions donβt become correlated somehow):

``````julia> function draw2(n)
a = rand(1:n)
b = rand(1:n-1)
if a >= b
b += 1
end
a, b
end
``````

(thinking a bit further, I think this option is fine, the second draw is a draw on the remaining possible βpositionsβ, which are univocally assigned to the corresponding values, thus that is probably as correlated as the first option, unless Iβm missing some subtlety - as shown bellow, clearly I am )

1 Like
``````julia> using UnicodePlots

julia> function draw2(n)
a = rand(1:n)
b = rand(1:n-1)
if a >= b
b += 1
end
a, b
end
draw2 (generic function with 1 method)

julia> histogram(reinterpret(Int,[draw2(3) for i = 1:100000]), nbins=3)
β                                        β
[1.0, 2.0) β€ββββββββββββ 33329
[2.0, 3.0) β€βββββββββββββββββββββββββββββββββ  99623
[3.0, 4.0) β€βββββββββββββββββββββββ 67048
β                                        β
Frequency
``````
5 Likes

Youβre drawing two numbers in `1:n`, so basically drawing two coordinates for a square with side n. There are n*n possible combinations. You exclude pairs with equal numbers, so youβre basically excluding the diagonal. Weβre in integers, there are n values on the diagonal of a total of n^2. So we have nΒ² - n valid values with equal distribution. As such, the chance for actually hitting the diagonal is n/nΒ² = 1/n. We can check this - for n=2, we exclude 2 elements on the diagonal out of a total nΒ² = 4. For n=3, we exclude 3 of 9, meaning our chance was 1/3. n=4 β 4/16 β 1/4 etc. This is the same chance as we get for having to draw any of the two a second time (since we have to draw at least one more time only if we hit the diagonal). As n grows, this quickly becomes very small. Iβd be very surprised if any method would be faster than just picking again (though of course you can optimize this for e.g. n=2 directly by returning the only other option left) without acidentally biasing/correlating the distribution of the second number to with the distribution of the first more than necessary (i.e., more than only having `n-1` possible values with equal chance).

Youβre biasing `b` to be larger when `a` is larger than `b`, meaning the distribution of `b` is correlated to the distribution of `a`.

6 Likes

Almost

``````function draw2_(n)
a = rand(1:n)
b = rand(1:n-1)
b += (b >= a)
return a, b
end

1.7.0-rc3> histogram(reinterpret(Int,[draw2_(3) for i = 1:100000]), nbins=3)
β                                        β
[1.0, 2.0) β€βββββββββββββββββββββββββββββββββ  66720
[2.0, 3.0) β€βββββββββββββββββββββββββββββββββ 66607
[3.0, 4.0) β€βββββββββββββββββββββββββββββββββ 66673
β                                        β
Frequency
``````
6 Likes

Gotta love those UnicodePlots.jl-powered histograms

4 Likes

In case anyone doesnβt catch the subtlety: the code by @DNF does `b += (b >= a)`, while the code by @lmiq does `b += (a >= b)`. The former correctly shifts only the distribution of values of `b` greater than `a` upwards by one, avoiding `a`, while the latter incorrectly shifts all values lower than `a`, meaning another value one less than `b` can potentially hit `a` again.

4 Likes

Itβs probably also useful to use explicit `rng`:

``````using Random
function draw2_(n, rng=MersenneTwister())
a = rand(rng, 1:n)
b = rand(rng, 1:n-1)
b += (b >= a)
return a, b
end

1.7.0-rc3> rng = MersenneTwister();

1.7.0-rc3> @btime draw2_(10, \$rng)
9.400 ns (0 allocations: 0 bytes)
(5, 10)
``````

Edit: Oops, pasted wrong function definition.

I much prefer `Xoshiro`

``````julia> @benchmark draw2(10_000, rng) setup=(rng=Xoshiro())
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
Range (min β¦ max):  7.970 ns β¦ 33.074 ns  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     8.002 ns              β GC (median):    0.00%
Time  (mean Β± Ο):   8.575 ns Β±  1.485 ns  β GC (mean Β± Ο):  0.00% Β± 0.00%

β  β ββ  ββ  β   β   ββ                                    β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
7.97 ns      Histogram: log(frequency) by time     12.6 ns <

Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark draw2(10_000, rng) setup=(rng=MersenneTwister())
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
Range (min β¦ max):  11.104 ns β¦ 49.395 ns  β GC (min β¦ max): 0.00% β¦ 0.00%
Time  (median):     11.607 ns              β GC (median):    0.00%
Time  (mean Β± Ο):   12.770 ns Β±  2.931 ns  β GC (mean Β± Ο):  0.00% Β± 0.00%

ββββββββββββββββββββββββββββ β                              β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
11.1 ns      Histogram: log(frequency) by time      26.1 ns <

Memory estimate: 17 bytes, allocs estimate: 0.
``````

Brilliant and very cool. My interpretation was correct, the implementation was wrong

1 Like

Oh, yes, I knew there was a better one, but forgot which it was.

`Xoshiro` will be the default in the (very soon) to be released 1.7

Still, from this matricial picture, it should be possible to call `rand(1:n^2-n)` and do an index transformation to obtain the two numbers. I donβt know if in any case the cost of those transformations pay off relative to a call to `rand()`.

Youβre more or less asking about converting a linear index to a cartesian one - and if I remember correctly, that involves a division + modulo, which is surely more expensive than a single `rand` and an addition. Additionally, simply excluding `n` values from the range is not sufficient - it matters which values you exclude.

1 Like

Yes, that. It is not completely evident that those index transformations are more expensive than a call to `rand()` though. (unfortunately to get the indexes right I would spend all dayβ¦). But a simple guess suggests that it might be something of the same order:

``````julia> function iter(n,draw)
for i in 1:1000
draw(n)
end
nothing
end
iter (generic function with 1 method)

julia> function draw2(n)
a = rand(1:n^2 - n)
b = mod(a,n) + div(n^3,n) # <--- some random stuff
if b >= a
b += 1
end
a, b
end
draw2 (generic function with 1 method)

julia> @btime iter(3,\$draw2)
13.623 ΞΌs (0 allocations: 0 bytes)

julia> function draw3(n)
a = rand(1:n)
b = rand(1:n-1)
if b >= a
b += 1
end
a, b
end
draw2 (generic function with 1 method)

julia> @btime iter(3,\$draw3)
14.473 ΞΌs (0 allocations: 0 bytes)

``````

If you provide the rng, it looks to me like `divrem` (which should be more or less the best case here) is definitely more expensive than `rand`:

``````function draw2_(n, rng=MersenneTwister())
a = rand(rng, 1:n)
b = rand(rng, 1:n-1)
b += (b >= a)
return a, b
end

function draw1(n, rng=Xoshiro())
ab = rand(rng, 0:(n^2-n-1))
(a, b) = divrem(ab, n)  # best case tranformation
return a, b  # wrong answer!!
end

1.7.0-rc3> @btime draw2_(10_000, rng) setup=(rng=Xoshiro())
8.000 ns (0 allocations: 0 bytes)
(7278, 3967)

1.7.0-rc3> @btime draw1(10_000, rng) setup=(rng=Xoshiro())
12.412 ns (0 allocations: 0 bytes)
(2233, 4742)
``````

Without the `rng` it is practically a βdrawβ:

``````function draw2b(n)
a = rand(1:n)
b = rand(1:n-1)
b += (b >= a)
return a, b
end

1.7.0-rc3> @btime draw2b(10_000)
12.412 ns (0 allocations: 0 bytes)
(2873, 7609)
``````
1 Like

Here is a working one (it is more expensive - but I need to call `rand(Bool)` once, I donβt know if that is less expensive than a `rand(Int)`, if not it does not make sense):

``````# get cartesian coordinates from linear index of upper triangular
julia> @inline function iuppert(k::Integer,n::Integer)
i = n - 1 - floor(Int,sqrt(-8*k + 4*n*(n-1) + 1)/2 - 0.5)
j = k + i + ( (n-i+1)*(n-i) - n*(n-1) )Γ·2
return i, j
end
iuppert (generic function with 1 method)

julia> function draw4(n)
ind = rand(1:(n^2-n)Γ·2)
a, b = iuppert(ind,n)
if rand(Bool) # need to allow lower triangular as well
a, b = b, a
end
a, b
end
draw4 (generic function with 1 method)

# seems correct:
julia> histogram(reinterpret(Int,[draw4(3)[2] for i = 1:100000]), nbins=3)
β                                        β
[1.0, 2.0) β€βββββββββββββββββββββββββββββββββ 33053
[2.0, 3.0) β€βββββββββββββββββββββββββββββββββ  33503
[3.0, 4.0) β€βββββββββββββββββββββββββββββββββ 33444

# but slower
julia> function iter(n,draw)
for i in 1:1000
draw(n)
end
nothing
end
iter (generic function with 1 method)

julia> @btime iter(3,\$draw4)
19.474 ΞΌs (0 allocations: 0 bytes)

julia> function draw2(n)
a = rand(1:n)
b = rand(1:n-1)
if b >= a
b += 1
end
a, b
end
draw2 (generic function with 1 method)

julia> @btime iter(3,\$draw2)
14.576 ΞΌs (0 allocations: 0 bytes)

``````

Much as I love this relation after having derived it in a course ages ago, it is really just making a relatively simple problem way more complicated here. Try this implementation instead:

``````function draw5(n)
ind = rand(0:(n * (n - 1) - 1))
a, b = divrem(ind, n)
a += (a >= b)
return a + 1, b + 1
end
``````
1 Like

Great! And that is faster than calling `rand()` twice:

``````julia> function iter(n,draw)
for i in 1:1000
draw(n)
end
nothing
end
iter (generic function with 1 method)

julia> function draw5(n)
ind = rand(0:(n * (n - 1) - 1))
a, b = divrem(ind, n)
a += (a >= b)
return a + 1, b + 1
end
draw5 (generic function with 1 method)

julia> @btime iter(3,\$draw5)
11.083 ΞΌs (0 allocations: 0 bytes)

julia> using UnicodePlots

julia> histogram(reinterpret(Int,[draw5(3) for i = 1:100000]), nbins=3)
β                                        β
[1.0, 2.0) β€βββββββββββββββββββββββββββββββββ 66522
[2.0, 3.0) β€βββββββββββββββββββββββββββββββββ 66651
[3.0, 4.0) β€βββββββββββββββββββββββββββββββββ  66827
β                                        β
Frequency

``````
1 Like